{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "61a08247",
   "metadata": {},
   "source": [
    "# Real World Data Science Interview Assignment\n",
    "\n",
    "This assignment shows my work in roughly the order that I examined the data and built the model. I left it as such so that those analyzing the work could see my thought processes as I went through the assignment.\n",
    "\n",
    "## Executive  Summary\n",
    "\n",
    "A simple model flagging all transactions that occurred in one second and those that had a repeated device ID with price above 29 dollars appears to capture most of the potential value. More investigation needs to be done to determine if other models would be able to find more signal in the data to flag more transactions as fraud."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "734a62c4",
   "metadata": {},
   "source": [
    "## Read in data\n",
    "\n",
    "On read, several columns will be converted to a different type. The two time columns will be converted to datetime. The source, browser, and sex columns are converted to categorical as they have very low cardinality. Some memory is saved by using an 8-bit integer for the class column. The first column in the CSV is unnamed containing unique integers and is not read."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d3b825e7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>signup_time</th>\n",
       "      <th>purchase_time</th>\n",
       "      <th>purchase_value</th>\n",
       "      <th>device_id</th>\n",
       "      <th>source</th>\n",
       "      <th>browser</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>ip_address</th>\n",
       "      <th>class</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>285108</td>\n",
       "      <td>2015-07-15 04:36:55</td>\n",
       "      <td>2015-09-10 14:17:56</td>\n",
       "      <td>31</td>\n",
       "      <td>HZAKVUFTDOSFD</td>\n",
       "      <td>Direct</td>\n",
       "      <td>Chrome</td>\n",
       "      <td>M</td>\n",
       "      <td>49</td>\n",
       "      <td>2.818400e+09</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>131009</td>\n",
       "      <td>2015-01-24 12:29:58</td>\n",
       "      <td>2015-04-13 04:53:55</td>\n",
       "      <td>31</td>\n",
       "      <td>XGQAJSOUJIZCC</td>\n",
       "      <td>SEO</td>\n",
       "      <td>IE</td>\n",
       "      <td>F</td>\n",
       "      <td>21</td>\n",
       "      <td>3.251268e+09</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>328855</td>\n",
       "      <td>2015-03-11 00:54:12</td>\n",
       "      <td>2015-04-05 12:23:49</td>\n",
       "      <td>16</td>\n",
       "      <td>VCCTAYDCWKZIY</td>\n",
       "      <td>Direct</td>\n",
       "      <td>IE</td>\n",
       "      <td>M</td>\n",
       "      <td>26</td>\n",
       "      <td>2.727760e+09</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>229053</td>\n",
       "      <td>2015-01-07 13:19:17</td>\n",
       "      <td>2015-01-09 10:12:06</td>\n",
       "      <td>29</td>\n",
       "      <td>MFFIHYNXCJLEY</td>\n",
       "      <td>SEO</td>\n",
       "      <td>Chrome</td>\n",
       "      <td>M</td>\n",
       "      <td>34</td>\n",
       "      <td>2.083420e+09</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>108439</td>\n",
       "      <td>2015-02-08 21:11:04</td>\n",
       "      <td>2015-04-09 14:26:10</td>\n",
       "      <td>26</td>\n",
       "      <td>WMSXWGVPNIFBM</td>\n",
       "      <td>Ads</td>\n",
       "      <td>FireFox</td>\n",
       "      <td>M</td>\n",
       "      <td>33</td>\n",
       "      <td>3.207913e+09</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id         signup_time       purchase_time  purchase_value  \\\n",
       "0   285108 2015-07-15 04:36:55 2015-09-10 14:17:56              31   \n",
       "1   131009 2015-01-24 12:29:58 2015-04-13 04:53:55              31   \n",
       "2   328855 2015-03-11 00:54:12 2015-04-05 12:23:49              16   \n",
       "3   229053 2015-01-07 13:19:17 2015-01-09 10:12:06              29   \n",
       "4   108439 2015-02-08 21:11:04 2015-04-09 14:26:10              26   \n",
       "\n",
       "       device_id  source  browser sex  age    ip_address  class  \n",
       "0  HZAKVUFTDOSFD  Direct   Chrome   M   49  2.818400e+09      0  \n",
       "1  XGQAJSOUJIZCC     SEO       IE   F   21  3.251268e+09      0  \n",
       "2  VCCTAYDCWKZIY  Direct       IE   M   26  2.727760e+09      0  \n",
       "3  MFFIHYNXCJLEY     SEO   Chrome   M   34  2.083420e+09      0  \n",
       "4  WMSXWGVPNIFBM     Ads  FireFox   M   33  3.207913e+09      0  "
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "sns.set_theme(rc={'figure.dpi': 144, \n",
    "                  'ytick.labelsize': 7, \n",
    "                  'axes.labelsize': 8, \n",
    "                  'xtick.labelsize': 7,\n",
    "                  'axes.titlesize': 9})\n",
    "\n",
    "dtype = {\n",
    "    'source': 'category',\n",
    "    'browser': 'category',\n",
    "    'sex': 'category',\n",
    "    'class': 'uint8'\n",
    "}\n",
    "df = pd.read_csv('../data/fraud.csv', \n",
    "                 usecols=range(1, 12), \n",
    "                 dtype=dtype,\n",
    "                 parse_dates=['signup_time', 'purchase_time'])\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f981f173",
   "metadata": {},
   "source": [
    "### Get metadata\n",
    "\n",
    "* 120k rows, 11 columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "9c3b36ee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(120000, 11)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48ccb37b",
   "metadata": {},
   "source": [
    "### Output data types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "f89d7fb5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "user_id                    int64\n",
       "signup_time       datetime64[ns]\n",
       "purchase_time     datetime64[ns]\n",
       "purchase_value             int64\n",
       "device_id                 object\n",
       "source                  category\n",
       "browser                 category\n",
       "sex                     category\n",
       "age                        int64\n",
       "ip_address               float64\n",
       "class                      uint8\n",
       "dtype: object"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0816ff9",
   "metadata": {},
   "source": [
    "### Check for missing values\n",
    "\n",
    "No missing values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "eade48d6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "user_id           0\n",
       "signup_time       0\n",
       "purchase_time     0\n",
       "purchase_value    0\n",
       "device_id         0\n",
       "source            0\n",
       "browser           0\n",
       "sex               0\n",
       "age               0\n",
       "ip_address        0\n",
       "class             0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.isna().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4ebda3c",
   "metadata": {},
   "source": [
    "### Examine uniqueness\n",
    "\n",
    "Some repeating device ids and ip_addresses, but mostly unique."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "6255b8f3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "user_id           120000\n",
       "signup_time       120000\n",
       "purchase_time     119729\n",
       "purchase_value       120\n",
       "device_id         110599\n",
       "source                 3\n",
       "browser                5\n",
       "sex                    2\n",
       "age                   57\n",
       "ip_address        114135\n",
       "class                  2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.nunique()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93afec4b",
   "metadata": {},
   "source": [
    "##   Read in IP Address mapping data\n",
    "\n",
    "The upper_bound_ip_address is read in as an integer, but needs to be converted to a float for the `merge_asof` pandas function to work with it. The country column is converted to categorical."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "eb2f20ab",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>lower_bound_ip_address</th>\n",
       "      <th>upper_bound_ip_address</th>\n",
       "      <th>country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>16777216.0</td>\n",
       "      <td>16777471.0</td>\n",
       "      <td>Australia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>16777472.0</td>\n",
       "      <td>16777727.0</td>\n",
       "      <td>China</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>16777728.0</td>\n",
       "      <td>16778239.0</td>\n",
       "      <td>China</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>16778240.0</td>\n",
       "      <td>16779263.0</td>\n",
       "      <td>Australia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>16779264.0</td>\n",
       "      <td>16781311.0</td>\n",
       "      <td>China</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   lower_bound_ip_address  upper_bound_ip_address    country\n",
       "0              16777216.0              16777471.0  Australia\n",
       "1              16777472.0              16777727.0      China\n",
       "2              16777728.0              16778239.0      China\n",
       "3              16778240.0              16779263.0  Australia\n",
       "4              16779264.0              16781311.0      China"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dtype = {\n",
    "    'upper_bound_ip_address': 'float64',\n",
    "    'country': 'category'\n",
    "}\n",
    "df_ip = pd.read_csv('../data/IpAddress_to_Country.csv', dtype=dtype)\n",
    "df_ip = df_ip.sort_values('lower_bound_ip_address')\n",
    "df_ip.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48b0f08c",
   "metadata": {},
   "source": [
    "### Get metadata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "8f7ea80e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(138846, 3)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_ip.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "2008188e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "lower_bound_ip_address     float64\n",
       "upper_bound_ip_address     float64\n",
       "country                   category\n",
       "dtype: object"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_ip.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "794d7b98",
   "metadata": {},
   "source": [
    "All IP Addresses in lookup table are unique."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "bd905f17",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "lower_bound_ip_address    138846\n",
       "upper_bound_ip_address    138846\n",
       "country                      235\n",
       "dtype: int64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_ip.nunique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "007a2657",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "lower_bound_ip_address    0\n",
       "upper_bound_ip_address    0\n",
       "country                   0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_ip.isna().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8137b177",
   "metadata": {},
   "source": [
    "### Check that IP Addresses are sorted and do not overlap\n",
    "\n",
    "We want to ensure each IPAddress range is mapped to exactly one country. The addresses need to be sorted as well for `merge_asof` to work. We stack them in one Series below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "4ab0cdc2",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    16777216.0\n",
       "0    16777471.0\n",
       "1    16777472.0\n",
       "1    16777727.0\n",
       "2    16777728.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = df_ip.iloc[:, :2].stack().droplevel(1)\n",
    "s.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c89c28b4",
   "metadata": {},
   "source": [
    "Verify they are sorted and do not overlap."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "fa0ccfcb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.is_monotonic_increasing"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "768db431",
   "metadata": {},
   "source": [
    "## Add country\n",
    "\n",
    "The `merge_asof` pandas function allows you to join two DataFrames on keys that don't match. The DataFrames must be sorted by their `left_on` and `right_on` columns. Below, the left (fraud) DataFrame will match with the last row of the right (ip addresses) DataFrame where the lower_bound_ip_address is less than the ip_address. First we sort the data and then perform the merge."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "d3b2343d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>signup_time</th>\n",
       "      <th>purchase_time</th>\n",
       "      <th>purchase_value</th>\n",
       "      <th>device_id</th>\n",
       "      <th>source</th>\n",
       "      <th>browser</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>ip_address</th>\n",
       "      <th>class</th>\n",
       "      <th>lower_bound_ip_address</th>\n",
       "      <th>upper_bound_ip_address</th>\n",
       "      <th>country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>119998</th>\n",
       "      <td>172984</td>\n",
       "      <td>2015-08-15 15:40:46</td>\n",
       "      <td>2015-10-30 09:47:39</td>\n",
       "      <td>9</td>\n",
       "      <td>TSDCMHPWAUZAR</td>\n",
       "      <td>Ads</td>\n",
       "      <td>IE</td>\n",
       "      <td>F</td>\n",
       "      <td>35</td>\n",
       "      <td>4.294822e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>3.758096e+09</td>\n",
       "      <td>3.758096e+09</td>\n",
       "      <td>Australia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>119999</th>\n",
       "      <td>168001</td>\n",
       "      <td>2015-03-03 11:27:19</td>\n",
       "      <td>2015-05-05 10:32:46</td>\n",
       "      <td>39</td>\n",
       "      <td>JLVKRXCKCWNLW</td>\n",
       "      <td>Ads</td>\n",
       "      <td>FireFox</td>\n",
       "      <td>F</td>\n",
       "      <td>41</td>\n",
       "      <td>4.294850e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>3.758096e+09</td>\n",
       "      <td>3.758096e+09</td>\n",
       "      <td>Australia</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        user_id         signup_time       purchase_time  purchase_value  \\\n",
       "119998   172984 2015-08-15 15:40:46 2015-10-30 09:47:39               9   \n",
       "119999   168001 2015-03-03 11:27:19 2015-05-05 10:32:46              39   \n",
       "\n",
       "            device_id source  browser sex  age    ip_address  class  \\\n",
       "119998  TSDCMHPWAUZAR    Ads       IE   F   35  4.294822e+09      0   \n",
       "119999  JLVKRXCKCWNLW    Ads  FireFox   F   41  4.294850e+09      0   \n",
       "\n",
       "        lower_bound_ip_address  upper_bound_ip_address    country  \n",
       "119998            3.758096e+09            3.758096e+09  Australia  \n",
       "119999            3.758096e+09            3.758096e+09  Australia  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df.sort_values('ip_address')\n",
    "df_all = pd.merge_asof(df, df_ip, left_on='ip_address', right_on='lower_bound_ip_address')\n",
    "df_all.tail(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ead332ff",
   "metadata": {},
   "source": [
    "While this works for most of the rows, the ip address is not checked for an upper bound. As you can see above, there are rows where the ip_address is not in the interval. Below, we make the country value missing in the rows where the ip_address is not between the bounds. Missing values for country are filled with the string 'Unknown'. The data is then sorted by signup time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "5bf4ba65",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>signup_time</th>\n",
       "      <th>purchase_time</th>\n",
       "      <th>purchase_value</th>\n",
       "      <th>device_id</th>\n",
       "      <th>source</th>\n",
       "      <th>browser</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>ip_address</th>\n",
       "      <th>class</th>\n",
       "      <th>country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>309557</td>\n",
       "      <td>2015-01-01 00:00:43</td>\n",
       "      <td>2015-01-01 00:00:44</td>\n",
       "      <td>14</td>\n",
       "      <td>BBPACGBUVJUXF</td>\n",
       "      <td>Ads</td>\n",
       "      <td>Chrome</td>\n",
       "      <td>F</td>\n",
       "      <td>38</td>\n",
       "      <td>2.001426e+09</td>\n",
       "      <td>1</td>\n",
       "      <td>Korea Republic of</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>124539</td>\n",
       "      <td>2015-01-01 00:00:44</td>\n",
       "      <td>2015-01-01 00:00:45</td>\n",
       "      <td>14</td>\n",
       "      <td>BBPACGBUVJUXF</td>\n",
       "      <td>Ads</td>\n",
       "      <td>Chrome</td>\n",
       "      <td>F</td>\n",
       "      <td>38</td>\n",
       "      <td>2.001426e+09</td>\n",
       "      <td>1</td>\n",
       "      <td>Korea Republic of</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>161246</td>\n",
       "      <td>2015-01-01 00:00:45</td>\n",
       "      <td>2015-01-01 00:00:46</td>\n",
       "      <td>14</td>\n",
       "      <td>BBPACGBUVJUXF</td>\n",
       "      <td>Ads</td>\n",
       "      <td>Chrome</td>\n",
       "      <td>F</td>\n",
       "      <td>38</td>\n",
       "      <td>2.001426e+09</td>\n",
       "      <td>1</td>\n",
       "      <td>Korea Republic of</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>356414</td>\n",
       "      <td>2015-01-01 00:00:46</td>\n",
       "      <td>2015-01-01 00:00:47</td>\n",
       "      <td>14</td>\n",
       "      <td>BBPACGBUVJUXF</td>\n",
       "      <td>Ads</td>\n",
       "      <td>Chrome</td>\n",
       "      <td>F</td>\n",
       "      <td>38</td>\n",
       "      <td>2.001426e+09</td>\n",
       "      <td>1</td>\n",
       "      <td>Korea Republic of</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>338656</td>\n",
       "      <td>2015-01-01 00:00:47</td>\n",
       "      <td>2015-01-01 00:00:48</td>\n",
       "      <td>14</td>\n",
       "      <td>BBPACGBUVJUXF</td>\n",
       "      <td>Ads</td>\n",
       "      <td>Chrome</td>\n",
       "      <td>F</td>\n",
       "      <td>38</td>\n",
       "      <td>2.001426e+09</td>\n",
       "      <td>1</td>\n",
       "      <td>Korea Republic of</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id         signup_time       purchase_time  purchase_value  \\\n",
       "0   309557 2015-01-01 00:00:43 2015-01-01 00:00:44              14   \n",
       "1   124539 2015-01-01 00:00:44 2015-01-01 00:00:45              14   \n",
       "2   161246 2015-01-01 00:00:45 2015-01-01 00:00:46              14   \n",
       "3   356414 2015-01-01 00:00:46 2015-01-01 00:00:47              14   \n",
       "4   338656 2015-01-01 00:00:47 2015-01-01 00:00:48              14   \n",
       "\n",
       "       device_id source browser sex  age    ip_address  class  \\\n",
       "0  BBPACGBUVJUXF    Ads  Chrome   F   38  2.001426e+09      1   \n",
       "1  BBPACGBUVJUXF    Ads  Chrome   F   38  2.001426e+09      1   \n",
       "2  BBPACGBUVJUXF    Ads  Chrome   F   38  2.001426e+09      1   \n",
       "3  BBPACGBUVJUXF    Ads  Chrome   F   38  2.001426e+09      1   \n",
       "4  BBPACGBUVJUXF    Ads  Chrome   F   38  2.001426e+09      1   \n",
       "\n",
       "             country  \n",
       "0  Korea Republic of  \n",
       "1  Korea Republic of  \n",
       "2  Korea Republic of  \n",
       "3  Korea Republic of  \n",
       "4  Korea Republic of  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "filt = df_all['ip_address'].between(df_all['lower_bound_ip_address'], \n",
    "                                         df_all['upper_bound_ip_address'])\n",
    "df_all['country'] = df_all['country'].where(filt)\n",
    "df_all['country'] = df_all['country'].cat.remove_unused_categories() \\\n",
    "                               .cat.add_categories('Unknown').fillna('Unknown')\n",
    "df_all = df_all.drop(columns=['lower_bound_ip_address', 'upper_bound_ip_address'])\n",
    "df_all = df_all.sort_values('signup_time', ignore_index=True)\n",
    "df_all.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d86a1f5",
   "metadata": {},
   "source": [
    "Find the number of missing values now. About 17k rows have IP addresses that are not found in our table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "4713b568",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "17418"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(df_all['country'] == \"Unknown\").sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee1df11c",
   "metadata": {},
   "source": [
    "## Creating a hold-out test dataset\n",
    "\n",
    "Before going any further, we will split our dataset into training and testing sets. We will not look at the test set until the very end of our analysis, in which we will use to get a final evaluation of our model. Since transactions are coming in over time, we will not randomly split the data, and instead choose to hold out the last 20%. The data is currently sorted by signup_time so selecting the tail of the DataFrame will work.\n",
    "\n",
    "Splitting the data like this provides a much more realistic assessment of our analysis/model as it would happen in the real world."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "2cafd694",
   "metadata": {},
   "outputs": [],
   "source": [
    "hold_out = len(df_all) // 5\n",
    "df_train = df_all.iloc[:-hold_out]\n",
    "df_test = df_all.iloc[-hold_out:]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b990ad0b",
   "metadata": {},
   "source": [
    "We verify the split happened by outputting the tail/head of the new DataFrames."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "b0cc7d8b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>signup_time</th>\n",
       "      <th>purchase_time</th>\n",
       "      <th>purchase_value</th>\n",
       "      <th>device_id</th>\n",
       "      <th>source</th>\n",
       "      <th>browser</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>ip_address</th>\n",
       "      <th>class</th>\n",
       "      <th>country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>95997</th>\n",
       "      <td>74448</td>\n",
       "      <td>2015-06-30 23:15:51</td>\n",
       "      <td>2015-07-29 15:38:55</td>\n",
       "      <td>29</td>\n",
       "      <td>ZNSKBVNQWQANG</td>\n",
       "      <td>Ads</td>\n",
       "      <td>IE</td>\n",
       "      <td>M</td>\n",
       "      <td>34</td>\n",
       "      <td>1.489732e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>Bulgaria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95998</th>\n",
       "      <td>233722</td>\n",
       "      <td>2015-06-30 23:16:50</td>\n",
       "      <td>2015-07-06 04:36:00</td>\n",
       "      <td>34</td>\n",
       "      <td>BYHIAEGRYUFTN</td>\n",
       "      <td>Ads</td>\n",
       "      <td>Chrome</td>\n",
       "      <td>M</td>\n",
       "      <td>43</td>\n",
       "      <td>2.177280e+09</td>\n",
       "      <td>1</td>\n",
       "      <td>United States</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95999</th>\n",
       "      <td>102254</td>\n",
       "      <td>2015-06-30 23:17:45</td>\n",
       "      <td>2015-10-04 21:52:34</td>\n",
       "      <td>46</td>\n",
       "      <td>AVGWGXJMCNHFY</td>\n",
       "      <td>Direct</td>\n",
       "      <td>Chrome</td>\n",
       "      <td>F</td>\n",
       "      <td>37</td>\n",
       "      <td>2.056748e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>Australia</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       user_id         signup_time       purchase_time  purchase_value  \\\n",
       "95997    74448 2015-06-30 23:15:51 2015-07-29 15:38:55              29   \n",
       "95998   233722 2015-06-30 23:16:50 2015-07-06 04:36:00              34   \n",
       "95999   102254 2015-06-30 23:17:45 2015-10-04 21:52:34              46   \n",
       "\n",
       "           device_id  source browser sex  age    ip_address  class  \\\n",
       "95997  ZNSKBVNQWQANG     Ads      IE   M   34  1.489732e+09      0   \n",
       "95998  BYHIAEGRYUFTN     Ads  Chrome   M   43  2.177280e+09      1   \n",
       "95999  AVGWGXJMCNHFY  Direct  Chrome   F   37  2.056748e+09      0   \n",
       "\n",
       "             country  \n",
       "95997       Bulgaria  \n",
       "95998  United States  \n",
       "95999      Australia  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.tail(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "a484325b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>signup_time</th>\n",
       "      <th>purchase_time</th>\n",
       "      <th>purchase_value</th>\n",
       "      <th>device_id</th>\n",
       "      <th>source</th>\n",
       "      <th>browser</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>ip_address</th>\n",
       "      <th>class</th>\n",
       "      <th>country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>96000</th>\n",
       "      <td>193125</td>\n",
       "      <td>2015-06-30 23:18:23</td>\n",
       "      <td>2015-09-01 15:39:26</td>\n",
       "      <td>9</td>\n",
       "      <td>IFMCHPMYFWIMI</td>\n",
       "      <td>Ads</td>\n",
       "      <td>Chrome</td>\n",
       "      <td>F</td>\n",
       "      <td>34</td>\n",
       "      <td>3.611508e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>United States</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>96001</th>\n",
       "      <td>294823</td>\n",
       "      <td>2015-06-30 23:19:04</td>\n",
       "      <td>2015-10-16 10:34:41</td>\n",
       "      <td>57</td>\n",
       "      <td>RIMPMUBEETXTZ</td>\n",
       "      <td>Ads</td>\n",
       "      <td>IE</td>\n",
       "      <td>M</td>\n",
       "      <td>33</td>\n",
       "      <td>1.934021e+08</td>\n",
       "      <td>0</td>\n",
       "      <td>United States</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>96002</th>\n",
       "      <td>91188</td>\n",
       "      <td>2015-06-30 23:20:13</td>\n",
       "      <td>2015-07-07 22:53:13</td>\n",
       "      <td>74</td>\n",
       "      <td>VHHMBZZYCJPDA</td>\n",
       "      <td>Ads</td>\n",
       "      <td>Chrome</td>\n",
       "      <td>M</td>\n",
       "      <td>31</td>\n",
       "      <td>5.421422e+08</td>\n",
       "      <td>0</td>\n",
       "      <td>United States</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       user_id         signup_time       purchase_time  purchase_value  \\\n",
       "96000   193125 2015-06-30 23:18:23 2015-09-01 15:39:26               9   \n",
       "96001   294823 2015-06-30 23:19:04 2015-10-16 10:34:41              57   \n",
       "96002    91188 2015-06-30 23:20:13 2015-07-07 22:53:13              74   \n",
       "\n",
       "           device_id source browser sex  age    ip_address  class  \\\n",
       "96000  IFMCHPMYFWIMI    Ads  Chrome   F   34  3.611508e+09      0   \n",
       "96001  RIMPMUBEETXTZ    Ads      IE   M   33  1.934021e+08      0   \n",
       "96002  VHHMBZZYCJPDA    Ads  Chrome   M   31  5.421422e+08      0   \n",
       "\n",
       "             country  \n",
       "96000  United States  \n",
       "96001  United States  \n",
       "96002  United States  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_test.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fa38a91",
   "metadata": {},
   "source": [
    "To further verify, the shape of each is outputted below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "7c76551c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(96000, 12)"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "07dc3784",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(24000, 12)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_test.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1436b2a2",
   "metadata": {},
   "source": [
    "Write the dataset to disk."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "0ccad8b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_train.to_csv('train.csv', index=False)\n",
    "df_test.to_csv('test.csv', index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8e7a4c8",
   "metadata": {},
   "source": [
    "## Summary statistics of simple continuous and categorical columns\n",
    "\n",
    "Summary statistics for some of the \"simpler\" continuous and categorical columns:\n",
    "\n",
    "* Continuous\n",
    "    * purchase_value\n",
    "    * age\n",
    "* Categorical\n",
    "    * source\n",
    "    * browser\n",
    "    * sex\n",
    "    * country\n",
    "    * class\n",
    "\n",
    "### purchase_value"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "c62768ff",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>purchase_value</th>\n",
       "      <th>age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>96000.000000</td>\n",
       "      <td>96000.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>36.858146</td>\n",
       "      <td>33.134917</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>18.354777</td>\n",
       "      <td>8.625734</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>9.000000</td>\n",
       "      <td>18.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>22.000000</td>\n",
       "      <td>27.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>34.000000</td>\n",
       "      <td>33.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>49.000000</td>\n",
       "      <td>39.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>154.000000</td>\n",
       "      <td>76.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       purchase_value           age\n",
       "count    96000.000000  96000.000000\n",
       "mean        36.858146     33.134917\n",
       "std         18.354777      8.625734\n",
       "min          9.000000     18.000000\n",
       "25%         22.000000     27.000000\n",
       "50%         34.000000     33.000000\n",
       "75%         49.000000     39.000000\n",
       "max        154.000000     76.000000"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train[['purchase_value', 'age']].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c7aa4f5",
   "metadata": {},
   "source": [
    "A Histogram, KDE, and boxplot are created to better understand the distribution of purchase value. It appears fairly uniform between 10 and 40 and then drops off quickly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "bb32e1a9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAqIAAAEuCAYAAABVm3DNAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAABYlAAAWJQFJUiTwAABVjklEQVR4nO3de1xUZf4H8M+ZK8OM3G8i3q8ICkhiXtI2szS7qUu/LrqWlZtWtm39THP9eWkTM0tLRXPL1rStTFPT3TWtNtdSTEkQUxTMCyogdxhmYG7n98fI0dFREAeGYT7v14uXnvOc88z3fBP68pzzPEcQRVEEEREREVEzk7k7ACIiIiLyTixEiYiIiMgtWIgSERERkVuwECUiIiIit2AhSkRERERuwUKUiIiIiNyChSgRERERuQULUSIiIiJyCxaiREREROQWLESJiIiIyC1YiBIRERGRW7AQJSIiIiK3YCFKRERERG7BQpSIiIiI3ELh7gC8nc0mwmKxujuMZqdS2f/pmUwWN0fSsjAvzjEvzjEvzjEvzjEvzjEvzl2dF4VCDplMcPnnsBB1M4vFiooKo7vDaHahoW0AwCuv/UaYF+eYF+eYF+eYF+eYF+eYF+euzou/v0YqTl2Jt+aJiIiIyC1YiBIRERGRW7AQJSIiIiK3YCFKRERERG7hEYWoyWTC/fffj71790r7zp8/j0mTJiE+Ph6jRo3C7t27Hc5JS0vDAw88gLi4OEyYMAFnzpxxaF+3bh2GDh2KhIQEzJw5EwaDweHzZs+ejf79+2Pw4MH429/+1rQXSEREROSFWnwhWltbiz//+c/IycmR9omiiKlTpyIgIAAbN27EmDFjMG3aNOTl5QEA8vPzMWXKFDz44IPYtGkTQkJCMHXqVNhsNgDAzp07sXTpUsyZMweffPIJsrKysHDhQqn/RYsW4dChQ/j4448xb948rFy5Ev/85z+b98KJiIiIWrkWvXxTbm4uXnnlFYii6LA/LS0Np06dwqeffgqdTodu3bph79692LhxI15++WVs2LABvXr1wrPPPgsAWLBgAQYPHoy0tDQMGjQIa9euxfjx4zF8+HAAwNy5c/HUU0/htddegyAI2LBhA1atWoXY2FjExsbimWeewfr16zF69OhmzwHZ+ftrXLJ+mc0mcokOIiKiFqJFj4gePHgQgwcPxhdffOGwPzMzE71794ZOp5P2JSYmIiMjQ2rv37+/1KbRaBATE4NDhw7BarUiKyvLoT0+Ph5WqxXHjh1DdnY2TCYTEhMTHfrOysqCxcLFbt1FJhMgyATojeZGfwkyoUkW4yUiIqLGadEjoo8++qjT/UVFRQgLC3PYFxwcjIKCghu2FxYWorKyErW1tQ7tCoUCAQEBKCgogFKphL+/P9RqtdQeEhICs9mM0tLSa/q9VSqVQlo01hvdzLVX6Gvxr72nG/1Zowd3hr9O7RH59oQY3YF5cY55cY55cY55cY55ca6p89KiC9HrMRqNUCqVDvtUKhXMZrPUrlKprmk3mUyoqamRtp21WywWp22AfRITEREREbmGRxaiarUaer3eYZ/JZIKPj4/UfnXRaDKZEBAQII10Omv38fGBIAhO2wD7LX5XM5ksXvnMYt1vWEVFVQ06PjDQF6Ioorq6ttGfKYoiLBYrysoM9R/sJjebF2/BvDjHvDjHvDjHvDjHvDh3dV74is8rhIeHo6ioyGFfcXExQkND622vK0aLi4ulNovFgvLycoSFhSE8PByVlZUOxWhRURFUKhX8/f2b8KqIiIiIvItHjojGxcXhgw8+gMFggK+vLwAgPT0d8fHxUvvBgwel441GI44ePYopU6ZAJpOhT58+SE9Px6BBgwAAGRkZkMvliI6Ohkwmg1KpxKFDhzBgwACp75iYGCgUHpkuukSrUUIulyEw0PeW+uHMeyIiItfwyBHRpKQkREZGYsaMGcjJycHq1auRmZmJ5ORkAMC4ceOQmZmJlStXIjc3F7NmzUJkZCQGDhwIAHj88cexZs0a7Ny5E1lZWZg3bx7GjRsHrVYLjUaDhx9+GPPmzcPhw4fx3XffYc2aNfjDH/7gzksmF5AJAmwiOPOeiIiohfDIIT65XI7U1FTMmjULY8eORYcOHbB8+XJERUUBAKKiorBs2TKkpKRg1apViIuLQ2pqKmQye909evRonD9/HnPnzoXJZMKIESMwY8YMqf+ZM2di7ty5mDhxIrRaLZ5//nncd999brlWahiTxYoKvQlVBjOqjGZYLDZYrDYIAqBRK+CrViArtxjtw9vgq+9PNPpzkkf0hE6jrP9AIiIiqpcgXr1aPDUrb5is5GwxeoVCDgCwWKwN6kMul6Gy2oQNO7MB2CceVehNyC81oKjciHJ9w1Y0UKvkCNSpEBmiRUSQLxTym7spUFeINtWEJz407xzz4hzz4hzz4hzz4hzz4lxzTVbyyBFR8ix1i9FXVV8uFgXB/nKAhv4e5K+zr3ZgrLXgTEEVzhVXw1Bz8y8YqDVZUVBqREGpEXKZgMgQLaJCtQjxt6+YQERERM2HhSg1i6pqE77cdVza1mrthWVDlmMSRRG3x7XD1t0nceBoIa5XuvprVfDTquDnq4RaJYdcJkAUAUOtBdVGM2rMNhSWXh7JtNpE5F3UI++iHj4qOSKDtYgI9kWQnxoyFqVERERNjoUotVhmiw0XiqtxqqAS2/aeuaZdIRcQEeSLiCBfhAT4QHXpdv/1THqoD3LPlePDLVk4V1QNvdEstdWYrPgtvxK/5VdCpZAh/FK/oQE+N337noiIiBqGhSi1KDabiIvlRpwr0qOg1Aib7drxz2A/H3Ru2wbhQRrIZTdXJEYEa9GjfQC6R/mjXG/CuSI9zhdVw2SxSceYLDZppFQuExAaoEHbYF9Ehmhv+fqIiIjoMhai5HaiKF63KKyjUsowuG8kRKsNflqVk15ujiAICGyjRmAbNWI6BaG4sgYFJQYUlBpQY7o8gcpqE1FQat9/7EwZAv01uH9w51v+fCIiImIhSm6kN5hw4mw5zhXpUX2diUd+WiXah+ow88kkQBCkWfOuJJMJCAvQICxAgz5dglCuN6Gg1ID8EsM1t+/X78jGN/vPYMKIHojtEuzyWIiIiLwJC1FqVqIoorDUiN9+LURxufNlq3xUckSFahEVqpNGP3W+KoeisKlcOVIa3TEQeoMZF0qqcSq/ErVm+0htUZkR727IxKDYCDx+d3f4+nBdUSIiosZgIUrNprjciKNnypyu+amQC2gbrEX7UC2CW9BSSjpfJXr4BqBrpB/OXtTjVH6VVBDvPVKAE3nleO6hWHSJ9HNzpERERJ6HhSg1OavVhi92ncDeXwsd9gsCEBagQVSoDhFBGshb8Ox0uVyGzm398MexffH5zhP48fAFAEBxRQ1S1qfjkd91w923RbWYApqIiMgTsBClJlWhr8XiLzJw9FSptE8mAN07BCK6UxBsDXyzUkvhr1PjlSf6oW+XIHz872Mw1lphtYn47LscZJ8tw6TR0dDyVj0REVGDtNwhKPJ4pZU1SFn/i0MRGhaowfDEKPTrGQaN2nN/D7qtVxjmPNkfHSPaSPsO5RRj3scHcCKv3H2BEREReRAWotQkSitrsOgfh3Dx0oQkQQB6dQjAgGjPLkCvFBboi9fHJ2J4YpS0r7iiBgs//QVrd2TDUNP0k6uIiIg8GQtRcrkKfa1DEaqQC3jpfxLQo31Aq3uGUqmQ4YkRPfD8mFiHAnt3xgXMXJ2GHfvPotbkWY8fEBERNRcWouRSJrMVy77KkopQuUzA/45PRGKvMDdH1rQSe4bhjaeTEN8tRNpXZTBjw39y8dqqvdi+93SzLD9FRETkSVrHPVJqMv7+GshkDRvFFEUR7/7jEH67UAnAPinpf8cnYmCfSFRWX7tkU2sT5OeDF8f1wS8nivD5dzkoqawFAFQazPjqv79h+77TuKNPJP7nnp4IDdTcsK/AQN96P89mE1FR4XwtViIiIk/AQpRuSCYTIMgEVDWgkPzqP7nSskYA8MTIXojxsrcPCYKAxJ5hiOsWgh8P52P7vtMovVSQmsw2fPfLOfz38AXce3tHjBrY6ZrXlQqC/Q1Toije8HPaaFW8nUFERB6PhSjVq6rahC93Hb/hMeeK9PjlRLG03SmiDcorjPhy13FMeqhPU4fY4ijkMtyZ0A5D+rbFz8cK8c3Peci7qAcAmC02bP/xFHamnUHfrsGIDNFK52m1agBAdXXtDftPHtETOg2XiSIiIs/GQpRuWWllDTJyLhehoQE+iO0S1OomJjWGQi7DoNi2GBgTgV9PlWLznlM4lW9/dMFkseHg8SJEllQjrmsIlAqOcRIRkXdhIUq3xFBrwc/ZF2G7dCdZp1Hitp6hkLEIdSAIAmK7BCOmcxByC6qwevMRlFTWAAAuFBugN+RjQO9waUSUiIjIG3AIhhrNarPhwLGLMJltAACVQoYBvcOgVMjdHFnLJQgCknpHYMHUQegQppP2VxrM2HM4H2WXilMiIiJvwEKUGkUURRw+WYqKS5OYBAHo3yuMr7dsIF8fJeK7hyC+ewjqBo9rTFZ8dzAPpSxGiYjIS7AQpUY5U6iXJt8AQGznIAT7+7gxIs/UIUyH23uHQyG3V6Nmiw3/ST/nFctdERERsRClm1ZaVYOs30qk7ahQLTpd8c51ujmhARoMjo2AUm7/djSZrdj7awEXwCciolaPhSjdlBqTFQezi1C3zKWfVoW+XYM5Q/4W+evUuD0mHAqpGLUh7ddCvh6UiIhaNRai1GA2m4j04xdRc6k4Uipk6N8rVCqe6NYEtlHjzn7tIL/0JitDrQX7jxXCYrW5OTIiIqKmwQqCGiz7bLn02koASOwRyslJLhYa6ItBfSOl7XK9Cb+cKK73TUtERESeiIUoNUhxRQ1yz1dI2706BCCsnvelU+NEhenQp0uQtF1QasDxvHL3BURERNREWIhSvaqNZhw6USRth/j7oHuUvxsjav06t/VDl0g/aftEXgUKSgxujIiIiMj1WIhSvdb+6xiMVzwXmtA9hJOTmkHvToEIuWJJrF9yilBl4LJORETUenh0IVpRUYFXX30VSUlJuOOOO7B48WJYrfaC6fz585g0aRLi4+MxatQo7N692+HctLQ0PPDAA4iLi8OECRNw5swZh/Z169Zh6NChSEhIwMyZM2EweOdo1IFjhdiXlS9tx3UNhkbNN8M2B5kg4LaeofC9lG+LVcSB7CKYLZy8RERErYNHF6Lz5s1DYWEh1q9fj7fffhtbtmzBxx9/DFEUMXXqVAQEBGDjxo0YM2YMpk2bhry8PABAfn4+pkyZggcffBCbNm1CSEgIpk6dCpvN/j/4nTt3YunSpZgzZw4++eQTZGVlYeHChe68VLeoNVnxty1HpO2oUC0iQ7RujMj7qJRy9O8VKs2k1xvNOJRTDJuNk5eIiMjzeXQhunv3bkycOBE9evTA7bffjvvvvx9paWlIS0vDqVOnMH/+fHTr1g2TJ09GQkICNm7cCADYsGEDevXqhWeffRbdunXDggULkJ+fj7S0NADA2rVrMX78eAwfPhx9+vTB3LlzsXnzZlRXV7vzcpvd1p9OoajcCMD+HvmYzkH1nEFNwV+nRly3YGm7oNSAr/f85saIiIiIXMOjC9GAgAB8/fXXMBqNKCwsxJ49exATE4PMzEz07t0bOp1OOjYxMREZGRkAgMzMTPTv319q02g0iImJwaFDh2C1WpGVleXQHh8fD6vVimPHjjXbtbnbuYt67Pw5T9ru3SkQaqXcjRF5t6hQncPkpa/+k4uMKyaQEREReSKPfthvzpw5mD59Ovr16webzYbbb78dL774IlJSUhAWFuZwbHBwMAoKCgAARUVFTtsLCwtRWVmJ2tpah3aFQoGAgADpfFdSqRQIDW15r8d8f1MWbJfWruzVKRC9Ot/C25MunabVqq9pcrbvZvtwRRwN7kIQoFDIb/m/mSBYbhiHs7b+vSNQXWNBYakBIoClXxzC+6/8DkF+Ptd20Eq1xO+VloB5cY55cY55cY55ca6p8+LRI6Jnz55F7969sX79eqxevRrnz5/HW2+9BaPRCKXScaF1lUoFs9n+7m6j0QiVSnVNu8lkQk1NjbTtrN0bZOUWIyPHPtomkwl46v4YzpJvAWQyAQP7tIWPyj4yXaE34Z1P02Hl86JEROShPHZE9OzZs1iwYAG+//57REREAADUajUmTZqE5ORk6PV6h+NNJhN8fHyk464uKk0mEwICAqBWq6Xt653vSiaTBRUVRpf321iiKGLNtssTlH7XLwqRIVr8dOjcLXRq/6O6+vJbmepG/K7cd7N9uCKOm+5CFGGxWFFW1vhVFAIDfSGKotM4GpKXft1DsO9oIUQROJxbjDVbDuPhO7o0Oh5PUPcbeVFRlZsjaVmYF+eYF+eYF+eYF+euzou/vwYqlevLRo8dET1y5Ai0Wq1UhAJAbGwsrFYrQkNDUVTk+PxccXExQkNDAQDh4eHXba8rRouLi6U2i8WC8vLya27nt0ZHTpUi95z9DUpymYBH7u7u5ojoaiEBGjw8rKu0ve2n0zh2utSNERERETWOxxaiYWFhqKysRH7+5TUuT548CQDo0qULsrOzHdb+TE9PR3x8PAAgLi4Ov/zyi9RmNBpx9OhRxMfHQyaToU+fPkhPT5faMzIyIJfLER0d3cRX5V6iKOKr/16ejT0sPhJhgb5ujIiu5+GhXRHb1T6TXgSwettRVFR7x6MjRETUenhsIRofH4/o6GjMnDkT2dnZyMjIwOzZs/HQQw/h3nvvRWRkJGbMmIGcnBysXr0amZmZSE5OBgCMGzcOmZmZWLlyJXJzczFr1ixERkZi4MCBAIDHH38ca9aswc6dO5GVlYV58+Zh3Lhx0Gpb9xqaR0+X4UyBfQheqZBh9MBO7g2IrksmE/Dyownw87U/C11RbcLftv3K9UWJiMijeOwzogqFAh988AEWLFiAiRMnQqlUYuTIkXj11Vchl8uRmpqKWbNmYezYsejQoQOWL1+OqKgoAEBUVBSWLVuGlJQUrFq1CnFxcUhNTYVMZq/LR48ejfPnz2Pu3LkwmUwYMWIEZsyY4c7LbTR/fw1ksoZNNNr7z8vLU92T1AFdOgRCLpdxolILpNUo4a9T4+XH+mH+mv0QRfsvEt9nXEDy8IY/TmGziS3qGWUiIvIuHluIAvZnPd977z2nbR07dsT69euve+6wYcMwbNiw67ZPnjwZkydPvuUY3U0mEyDIBFTVc9u2Ql+Ln3+9vDzVkPh20BvN8NfdwnJJ1GRkggCbCHRrH4AHhnSRFrj/bNdxdIr0Q3Sn+l8+0Ear8txbIkRE1Cp4dCFKDVNVbcKXu47f8JiccxXSMkCBbdTYl3keADDpoT5NHh81jt5g/+9qE0UE+alRWlkLUQTe/ccvGBYfWe8LCJJH9IROo7zhMURERE2JAyIEURRxtvDyshUdw3U3OJpaGpkgILFHKFQK+7dzjcmK9ONF0gsJiIiIWioWooSSihpU11gAAAq5gMiQ1j0pqzXSqBXo1yNE2i6uqMGvp7ikExERtWy8NU84c/Hy4v9RoToo5Pz95Hq0GiXkchkCb2FZq6aaABYW6Ise7f1xIs++Duyp/CroNEp0butXz5lERETuwULUy9lEEYWll9db5W35G6ubJKQ3mhvdR1NOAOvZPgB6gxkXSuz/TY/8VgpftQLhQVwPloiIWh4Wol6urKoWFqv9WUIflRx+WpWbI2r56iYJNVZTTgATBAHx3UNQXVOAimoTRAAHjhfh9t5hCPHXNNnnEhERNQbvwXq5orLLa0iGBWq4ZmgroJDLMCA6DL5q+++ZNpuI/Ucvoqzq+u+vJyIicgcWol7uYvkVhWgAR8xaCx+1AgNjw+Gjsi/hZLWJ2PdrAUora9wcGRER0WUsRL1YrcmKcr19oXsBQGiAj3sDIpfS+igxMCZcWtbJYhWx79dCFFewGCUiopaBhagXu3I0NNBPDaXixgugk+dp46vCoNgIqJT2b3WrTcT+o4W4WMbXehIRkfuxEPVivC3vHfy0KgyOjZDetGS1ifj5WCF+OX7RzZEREZG3YyHqpURRvGaiErVebXxVGNwnAhq1vRi1icD7X2Rg7+ELbo6MiIi8GQtRL1WhN8FksQEAVEoZ/LlsU6un0ygxOLYtfH3ss+mtNhHvfnYIh0+WuDkyIiLyVixEvVTpFUv5hPpz2SZv4eujwODYCOg0SgD2YjR1SxZOnq9wc2REROSNWIh6qYrqy4VoYJume9MPtTwatQKDYsIR4m9fJcFktmHpl5m4UFzt5siIiMjbsBD1UnXLNgHgbXkv5KNWYPqE26Q3aVXXWLDsqywYay1ujoyIiLwJC1EvZLXaoDdcflc6C1Hv1DZEi9mTkqSlnQpLDfjon8cgiqKbIyMiIm/Bd817oUqDGXWlhtZHAYWCv494I61GiZ4dg/DC7+Pw7meHAAC/nCjCD4fzMfbObg3ux2YTUVHBdUmJiOjmsRD1QhX6y8+HBuj4fKi3kgkCbCKQ0DMM9w7ogG/2nwUArN+Rjc6R/ujePqDePtpoVbytQkREjcZC1AtVVPP5ULLTG0z4ctdxKGUCgtqoUVpVC1EE3vk0HcPiI6GQ37jMTB7RU5qBT0REdLM4mOGFHApRHQtRAmQyAf16hkIhty/jVV1jwdHTZW6OioiIWjsWol7GZhNRyRFRcsJXrUBsl2Bp+3RBFS6WGdwYERERtXYsRL1MldEM26WZSr5qBVSX3j9OBADtQ7WICPKVtjNzS2Cx2twYERERtWYsRL3MlROVeFueriYIAuK6BUN1aSUFo8mK7LPl7g2KiIhaLRaiXoYTlag+aqUcMZ2DpO3fLlSi/IpfYIiIiFyFhaiXcXijEkdE6TqiQrXSK0AB+y16Gxe6JyIiF2Mh6kVEUUSVgSOiVD9BENC3azBkMvss+opqE05dqHRzVERE1NqwEPUiJosNFqt9VEshF6DmRCW6AZ1GiR5R/tJ29tlyGPgueiIiciEWol6k2nj5/fJaHyUEQXBjNOQJurXzRxtf+4L1VpuIrJMlfBc9ERG5jEsK0S1btqCszPni10VFRfjoo49c8THXMJvNSElJwYABAzBgwADMmTMHJpP91vP58+cxadIkxMfHY9SoUdi9e7fDuWlpaXjggQcQFxeHCRMm4MyZMw7t69atw9ChQ5GQkICZM2fCYPD89RSray6PZml9+FItqp9MJiCu6+W1RQvLjMgv8fzvBSIiahlcUojOnDkTeXl5TtsOHz6MpUuXuuJjrrFo0SLs2rULqampWLlyJfbs2YMVK1ZAFEVMnToVAQEB2LhxI8aMGYNp06ZJMebn52PKlCl48MEHsWnTJoSEhGDq1Kmw2ezrJe7cuRNLly7FnDlz8MknnyArKwsLFy5skmtoTg4jonwtIzVQkJ8POka0kbaPnCqFxcK1RYmI6NY1elhs4sSJyMrKAmCfBDNx4kSnt3pramoQExPT+Aivo7KyEp999hk++OADJCYmAgBeeOEF/Otf/0JaWhpOnTqFTz/9FDqdDt26dcPevXuxceNGvPzyy9iwYQN69eqFZ599FgCwYMECDB48GGlpaRg0aBDWrl2L8ePHY/jw4QCAuXPn4qmnnsJrr70GrVbr8mtpLhwRpcbq3TEABSUG1JqtqDFZcTyv3GGJJyIiosZodDUye/Zs7NixA6IoYsWKFRg9ejQiIiIcjpHJZPDz88N99913y4FeLT09HT4+Phg0aJC0b+zYsRg7dixWrVqF3r17Q6fTSW2JiYk4ePAgACAzMxP9+/eX2jQaDWJiYnDo0CEMGDAAWVlZmDJlitQeHx8Pq9WKY8eO4bbbbnP5tTQXPUdEqZGUCjliOgXil5xiAPa1RduH6eo5i4iI6MYaXYh269YNL7zwAgD7Ui/JyckIDw93WWD1OXv2LNq1a4ft27dj1apVMBgMGDlyJF5++WUUFRUhLCzM4fjg4GAUFBQAwHXbCwsLUVlZidraWod2hUKBgIAA6XxXUqkUCA1tU/+Bt0AQLPD1VTnMeA4N0kKjbsB//kuD3Fqt+hYCuH4fDe63ieNoaX3U278brqVHJxXOFVfjYpkRIoBfz9ifC1co5E3+b7hOc32Op2FenGNenGNenGNenGvqvLjk/mxdQVpRUQGj0Sg9a3mlyMhIV3yUpLq6GufOncP69esxb948VFdXY968ebBYLDAajVAqHUf8VCoVzGb7iKDRaIRKpbqm3WQyoaamRtp21u6pas1WmC8916eQC/BRcekmujmCIOC26HD8e99piCJQVGbET4cvYPTgLu4OjYiIPJRLCtHffvsNM2fOxOHDh69pE0URgiDg2LFjrvgoiUKhgF6vx9tvv40OHToAAKZPn47p06djzJgx0Ov1DsebTCb4+NjfFKNWq68pKk0mEwICAqBWq6Xt653vSiaTBRUVRpf3Wycw0BeiKKKopFra5+ujhMHQwKL60ko91dW38IpHJ33UjcI1uN8miqOl9dHgvLjpWhQC0CXSDyfP2xe3/8c3xzGkbyRMNeZ6zrw1db+RFxVVNenneBrmxTnmxTnmxTnmxbmr8+Lvr4FK5fr5JS7pce7cuSgoKMDrr7+OiIiIZlmfMiwsDAqFQipCAaBz586ora1FaGgoTpw44XB8cXExQkNDAQDh4eEoKiq6pr179+5SMVpcXIwePXoAACwWC8rLy6+5ne9JrpyopONEJboFPdsH4HxRNWpMVlRWm/CPncfx+6EcFSUiopvnkookMzMTixcvxogRI1zRXYPEx8fDYrHg+PHj6NmzJwDg5MmT0Gq1iI+Px0cffQSDwQBfX18A9slN8fHxAIC4uDhp4hJgv1V/9OhRTJkyBTKZDH369EF6ero0ESojIwNyuRzR0dHNdn2uxqWbyFUUchliOwfh4HH7L3M79p1G/x6hDks8ERERNYRL1hENDg6GXN68zxx26tQJw4cPx8yZM3HkyBEcPHgQixcvxiOPPIKBAwciMjISM2bMQE5ODlavXo3MzEwkJycDAMaNG4fMzEysXLkSubm5mDVrFiIjIzFw4EAAwOOPP441a9Zg586dyMrKwrx58zBu3Dgu3UR0SdtgX4QG2B9VsYnAup3HYeMbl4iI6Ca5pBB98sknsXz5cpSUlLiiuwZbtGgRevbsiYkTJ+L555/HiBEj8Morr0AulyM1NRWlpaUYO3Ystm7diuXLlyMqKgoAEBUVhWXLlmHr1q0YN24ciouLkZqaCpnMno7Ro0djypQp0vqhsbGxmDFjRrNem6tV1zi+3pPoVgiCgD5dgqGQ2x/D+e1CJX48nO/mqIiIyNO4ZGjs4MGDyMvLw9ChQxEZGXnNpB5BEPD111+74qMc6HQ6pKSkICUl5Zq2jh07Yv369dc9d9iwYRg2bNh12ydPnozJkye7JE53E0XxqjVEOSJKt06nUWL04M7Y+t/fAAAbfziJfj1CoeOjH0RE1EAuqUi0Wi3uvvtuV3RFTUBvMMNitd82lcsEqJVcuolc44EhXbAvKx8Xy4zQG83Y+MNJPDmql7vDIiIiD+GSQtTZiCS1HAWlBunvWo2iWVY1IO+gVsnxzIOxWLD2AABgT+YF3BHXFl0j/d0cGREReQKXFKIHDhyo95grX6lJzavwykKUz4eSi/XvHY74biHIyC2GCGD9Nycwe+JtkMn4Cw8REd2YSwrRCRMmQBAEiFfNmr1y5M3VC9pTwxWVXVmI8vlQcr3H7u6OX0+Xwmyx4UxhFf5z6DyGJ0a5OywiImrhXFKVbNmy5Zp91dXVOHjwID777DO8//77rvgYaqTSyhrp7w16vzzRTQoN0OD+gR2xec8pAMBX//0Nt/UKg79WVc+ZRETkzVxSlfTq5XxyQmJiItRqNd5++22sW7fOFR9FjVBSwUKUmt7IAR2x90gBCsuMMNZasOH7XDz7QG93h0VERC2YS9YRvZHo6Gin76Cn5sNClJqDUiHDE/f0kLb3/VqAzNxiN0ZEREQtXZNWJXq9Hp9++qn0jndqfqIoOhaiKi7dRK6j1Sghl8sQGGh/le4dgb74ObsIP2ZeAAB88s1xLO0dAb96btHbbCIqKoxNHi8REbUsLilEExISrlkSSBRF1NTUQBRFvPnmm674GGqEaqMZtWYrAPsaokpFkw+CkxeRCQJsIhxemPD4PT2RlVuMimoTyqpqseqrw5j6+7jr9tFGq2r6WzNERNQiuaQQnTRpktO1KXU6HYYOHYouXbq44mOoEYrKHW/Lcw1RcjW9wYQvdx132NezQwB+PnYRALDvSAFMJivahWqdnp88oiffxkRE5KVcUoi++OKLruiGmkDxFbc7NWrelqfmERHki/ZhOuRd1AMAMk8WI6CNiuvYEhGRA5c9I1pUVIQ1a9bgwIED0Ov1CAgIQGJiIv7whz8gPDzcVR9DN6m4/IpCVMWJStR8YjsHoaSiBoZaCyxWEenHizCkT1sudE9ERBKXPJp15swZPPzww9iwYQMiIiIwYMAABAcH47PPPsPDDz+MM2fOuOJjqBEcClHOmKdmpFTIkNgzFHVPg5TrTTh2psy9QRERUYviksrkrbfeQnBwMNauXYvAwEBpf2lpKZ5++mksXrwYy5Ytc8VH0U1yLER5a56aV2AbNaI7BuLoaXsBevJCJQL91IgMdv68KBEReReXjIimpaXhhRdecChCASAoKAjPPfcc9u/f74qPoUYo5hqi5GZdI/0QFqiRtg+dKEaVweTGiIiIqKVwSSGq0WggkznvSiaTwWKxuOJjqBGKeGue3EwQBPTrHgJfH/u/P6tNxM/HLsJssbo5MiIicjeXFKK33XYbUlNTUVFR4bC/vLwcqampSEpKcsXH0E2y2USUcjF7agFUSjn69wqD/NJEpeoaC345UQxRFN0cGRERuZNLhsimT5+O3//+97jrrrswYMAAhISEoLi4GPv374dCocDixYtd8TF0kyqqTbDa7P+jVylkkMu5bDi5j79WhfjuIUg/XgQAKCwz4vjZcvcGRUREbuWSyqRdu3bYsmULkpOTcfHiRaSlpaG4uBjJycnYunUrunbt6oqPoZtUWsnnQ6llaReiRbd2ftL2iXMVOHis0I0RERGRO91SdSKKIrZt24aAgAAMHToUM2bMAADYbDZMmjQJPXr0QEREhEsCpZtX4lCI8rY8tQzRHQNRUW2S3vr1weYsdGrrB38f/rJERORtGj0iarFY8NJLL+G1115DWlqaQ1tJSQmKioowa9YsvPLKK7DZbLccKN280spa6e8cEaWWQhAEJPYIhe+lf5M1Jiv++vHPKKuqredMIiJqbRpdiH7xxRfYvXs33nnnHUyfPt2hLTQ0FP/85z+xcOFCfPPNN9i0adMtB0o3z+HWPN+qRC2ISilHUnQYFHL75KWSihq8tzETNSausEFE5E0aXYhu3LgRTz/9NO67777rHvPQQw/hsccew+eff97Yj6FbUFrFEVFqufy0KtzWMwyyS69eOluox+qvj8Jm40x6IiJv0ehC9MyZM+jfv3+9x91xxx04ffp0Yz+GbgGfEaWWLixQgyfvj5a2M3KL8fl3OW6MiIiImlOjC1EfHx8YDIZ6jxNFEUqlsrEfQ7egjLPmyQP8LrE9xtx5eWWNb9PPYdfBPDdGREREzaXRhWh0dDS+//77eo/77rvv0KlTp8Z+DN0Cn0vPhbbxVUHNxeypBRt/by/c1itM2v782xwcOlHkxoiIiKg5NLoQfeyxx7B582Z8+eWX1z1m48aN2LRpE8aOHdvYj6FbMHFkTwzu2xbPje0jPYdH1BLJZAKeGR2NrpfWGBUBfPD1rzh5oeLGJxIRkUdr9P3au+++G//zP/+D2bNn49NPP8WwYcMQGRkJm82G/Px87NmzB9nZ2Rg5ciQeeeQRV8ZMDRTdKQiDEqKgN5px/FSJu8MhuiGVUo4Xx/XFgk/ScbHcCJPFhvc3HsasCYkIDW3j7vCIiKgJ3NKDg3PmzEFcXBw++ugjfPDBBw5tvXv3RkpKCh5++OFb+QgiauW0GiXkchkCA30RGOiLOc8OwIwVP6HKYEaVwYz3NmXh7RcD4K9TIzDQ97r92GwiKiqMzRg5ERHdqlt+xefDDz+Mbdu2Yc+ePfjyyy/x1VdfYd++ffjqq6+arQidNWsWJkyYIG2fP38ekyZNQnx8PEaNGoXdu3c7HJ+WloYHHngAcXFxmDBhAs6cOePQvm7dOgwdOhQJCQmYOXNmgyZlEVHjyAQBNhHQG83QG83w06rx8mP9oFTYfzzlF1dj3odpKCozSMdc/SXIBMhkfPyEiMjTuGwqdWhoKEJDQ13VXYPt27cPGzduRFJSEgD7LP2pU6eia9eu2LhxI77//ntMmzYN27dvR/v27ZGfn48pU6Zg6tSp+N3vfocVK1Zg6tSp2LZtG2QyGXbu3ImlS5di0aJFCAsLw8yZM7Fw4ULMnz+/2a+NyFvoDSZ8ueu4w764bsE4mG2fsJSTV47/W70PCd2CITh53jl5RE/oNFydg4jI09zyiKg7GQwGzJ49G/369ZP2paWl4dSpU5g/fz66deuGyZMnIyEhARs3bgQAbNiwAb169cKzzz6Lbt26YcGCBcjPz5deU7p27VqMHz8ew4cPR58+fTB37lxs3rwZ1dXVbrlGIm8VGaxFbOcgafvcRT2OnSlzY0RERORqHl2ILlmyBElJSdJoKABkZmaid+/e0Ol00r7ExERkZGRI7VcuxK/RaBATE4NDhw7BarUiKyvLoT0+Ph5WqxXHjh1r+gsiIgddIv3Qpa2ftJ17vhJ5F/VujIiIiFzJY1c5P3ToEHbs2IHt27djzZo10v6ioiKEhYU5HBscHIyCgoIbthcWFqKyshK1tbUO7QqFAgEBAdL5rqZSKZp8RrAgWKDVqht5sv2PRp9fTx8N7reJ42hpfdTbvwddy6320T82AjVmKy4U2+9KZOaWIDjQF6EBmstdCAIUCrnXzq731uuuD/PiHPPiHPPiXFPnxSNHRE0mE2bNmoXXX38d/v7+Dm1Go/GaNzmpVCqYzWapXaVSXdNuMplQU1MjbTtrJ6LmJxMEDOrbFv5a+/elTRSxJ+M8qo1mN0dGRES3yiNHRFesWIGOHTti1KhR17Sp1Wro9Y637kwmE3x8fKT2q4tKk8mEgIAAqNVqaft657uayWRp0iVnAgN9IYoiqqtrG9eBaP+j0edfp4+60a8G99tEcbS0PhqcFw+4Flf2odWqMTShHb5JOwOTxYZakxU//JKHIX3aQiGXQRRFWCxWlJV51woXdSMVRUVVbo6kZWFenGNenGNenLs6L/7+GqhUri8bPXJEdNu2bfjxxx+RkJCAhIQEfPTRRzh48CASEhIQHh6OoiLHVwMWFxdLM/pv1F5XjBYXF0ttFosF5eXl19zOJ6LmpfNVoX+vMNRNmq+sNuOXE8UQRdG9gRERUaN5ZCG6bt06bN++HVu2bMGWLVuQnJyM2NhYbNmyBXFxccjOznZY+zM9PR3x8fEAgLi4OPzyyy9Sm9FoxNGjRxEfHw+ZTIY+ffogPT1das/IyIBcLkd0dHSzXR8RORfs74O+XYOl7YJSA7LPlrsvICIiuiUeWYi2a9cOHTt2lL78/Pzg4+ODjh07IikpCZGRkZgxYwZycnKwevVqZGZmIjk5GQAwbtw4ZGZmYuXKlcjNzcWsWbMQGRmJgQMHAgAef/xxrFmzBjt37kRWVhbmzZuHcePGQavVuvOSieiSjuFt0CXy8kz6nHMVSDuS78aIiIiosTyyEL0RuVyO1NRUlJaWYuzYsdi6dSuWL1+OqKgoAEBUVBSWLVuGrVu3Yty4cSguLkZqaipkMnsqRo8ejSlTpmDu3Ll46qmnEBsbixkzZrjzkojoKjGdAhF2xaz5v205gpPnK9wYERERNYZHTla62ssvv+yw3bFjR6xfv/66xw8bNgzDhg27bvvkyZMxefJkl8VHRK4lCAISe4bgv5n5qK6xwGSxYeHaA5j1h9uk2fVERNTytboRUSLyDkqFHAOiw6GQ22cvFVfUYMXmLJgtNjdHRkREDcVClIg8ls5XicSeodJM+txzFVi/8zhn0hMReQgWokTk0cIDffE/d/eQtvcczsd36efcGBERETUUC1Ei8nj3DeqEO/u1k7Y//y4XR0+XujEiIiJqCBaiROTxBEHAlLF90bmtfVknmyhi5ZYjuOhlb1oiIvI0LESJqFVQKeV4YWwf+Ovss+arayx4f1MWDDUWN0dGRETXw0KUiFqNwDZqvDi2LxRy+4+2C8XVWLn1CKw2zqQnImqJWIgSUavSJdIPk+7rJW3/eqoUn+7K4Ux6IqIWiIUoEbU6t8dE4MHBnaTtHw6dx64Dee4LiIiInGIhSkSt0kNDOuP23uHS9hff5+JQTpEbIyIioquxECWiVkkQBDx1Xy90i/IHAIgAPvj6V5wpqHJvYEREJGEhSkStllJhn0kfGuADADCZbXhvYyZKK2vcHBkREQEsRImolfPzVeFPyXHwVSsAAOV6E97feBg1Ji7rRETkbixEiajVaxusxfNjYiGX2V9Kf/aiHh9s/RU2G2fSExG5EwtRIvIK0Z2C8Id7e0rbmSdL8MX3uW6MiIiIWIgSkde4Iy4S993eUdredTAP3/9yzo0RERF5NxaiRORVxg7rgsSeodL2p7tOIOu3EjdGRETkvViIEpFXkQkCnrm/Nzq39QMAiCKwcssRnLuod3NkRETeh4UoEXkdtVKOaeP6INhPDQCoMVnx3sZMVOhr3RwZEZF3YSFKRF7JX6fGS7+Pg49KDgAoqazF+5sOo9ZsdXNkRETeg4UoEXmtqDAdpjwcC5lgX9bpVH4VPtx+FDaRyzoRETUHhbsDICK6VVqNEnK5DIGBvjd97tBEXxhMVnyw5QgAIP14ETbtPonkO7u5OkwiIroKC1Ei8ngyQYBNBPRGc6POHzW4My4UV2Pbj6cAAP9OO4vwQF8MjYt0ZZhERHQVFqJE1CroDSZ8uet4o85NHtETE0f3Rl5BFTJyiwEA6745Dn+tCnHdQlwZJhERXYHPiBIRAZDLBEx+sDc6hOkAAFabiBWbs5B5qTAlIiLXYyFKRF6v7hnTtuF++L9nBiA8yP6sqcUqYsXmI8gtqEJgoG+9X/7+GjdfCRGRZ2EhSkRe78pnTNUqBWb84TaEBtiLSovVhrc+OYhv9p+B3mi+7pcgEyCTCW6+EiIiz8JnRImIcO0zpnHdgrE3qwCGWgusNhGrvsrCnkPn0DXS3+n5ySN6QqdRNle4REStAkdEiYic8FUrMLhPBNr4Xi4ufz1VhqOnSyFynVEiIpfw6EL07NmzeO6559C/f38MHToUCxcuRG2t/RV958+fx6RJkxAfH49Ro0Zh9+7dDuempaXhgQceQFxcHCZMmIAzZ844tK9btw5Dhw5FQkICZs6cCYPB0GzXRUQtg+ZSMRrURi3tyz1ficzcEi56T0TkAh5biJpMJjz33HNQqVT4/PPPsXjxYnz77bdYsmQJRFHE1KlTERAQgI0bN2LMmDGYNm0a8vLyAAD5+fmYMmUKHnzwQWzatAkhISGYOnUqbDYbAGDnzp1YunQp5syZg08++QRZWVlYuHChOy+XiNxEpZDj9phwhAdenoh09qIeB7Mvwmq1uTEyIiLP57GF6OHDh3H27FmkpKSga9euSEpKwksvvYRt27YhLS0Np06dwvz589GtWzdMnjwZCQkJ2LhxIwBgw4YN6NWrF5599ll069YNCxYsQH5+PtLS0gAAa9euxfjx4zF8+HD06dMHc+fOxebNm1FdXe3OSyYiN1HIZegfHYb2l5Z2AoCCUiP2HS2EycJ30xMRNZbHFqJdunTB6tWrodVqpX2CIMBkMiEzMxO9e/eGTnf5fxqJiYnIyMgAAGRmZqJ///5Sm0ajQUxMDA4dOgSr1YqsrCyH9vj4eFitVhw7dqzpL4yIWiSZICC+WzC6tfOT9pVW1uKnrAIYay1ujIyIyHN57Kz5oKAgDBo0SNq22WxYv349EhMTUVRUhLCwMIfjg4ODUVBQAADXbS8sLERlZSVqa2sd2hUKBQICAqTzXUmlUiA0tI3L+72SIFig1arrP9DpyfY/Gn1+PX00uN8mjqOl9VFv/x50La7s47rH3GocN3F+/5i2aKNT49DxIgBAlcGMvUcKMGJgJ0R3Cm7y72dn3PGZnoB5cY55cY55ca6p8+KxI6JXS0lJwbFjx/DKK6/AaDRCqXRcRkWlUsFstr+H2mg0QqVSXdNuMplQU1MjbTtrJyLq1TEIt8dGQLhUwFbXWPDGR/uRk1fm3sCIiDyMx46I1hFFEW+++SY+++wzvPfee+jevTvUajX0er3DcSaTCT4+PgAAtVp9TVFpMpkQEBAAtVotbV/vfFcymSyoqDC6vN86gYG+EEUR1dW1jevg0sTgRp9/nT7qRp4a3G8TxdHS+mhwXjzgWlzZR715udU4GnF+mL8PkqLDcDC7CFabiCqDGa+n/oTnx/ZBTKegxsVxk+pGKoqKqprl8zwF8+Ic8+Ic8+Lc1Xnx99dApXJ92ejRI6I2mw2vv/46Pv/8cyxZsgR33303ACA8PBxFRUUOxxYXFyM0NLTe9rpitLj48vulLRYLysvLr7mdT0TeLTzQFwNjw6FU2H+U1pisWLohEz8fK3RzZEREnsGjC9GFCxdi27ZtWLZsGe655x5pf1xcHLKzsx3W/kxPT0d8fLzU/ssvv0htRqMRR48eRXx8PGQyGfr06YP09HSpPSMjA3K5HNHR0U1/UUTkUYLa+GBInwgE+dnvmFhtIj7Y+it2/nyWC98TEdXDYwvRjIwMrF27FtOmTUNsbCyKioqkr6SkJERGRmLGjBnIycnB6tWrkZmZieTkZADAuHHjkJmZiZUrVyI3NxezZs1CZGQkBg4cCAB4/PHHsWbNGuzcuRNZWVmYN28exo0b5zBDn4ioThtfFf7v6SREXVreSQTw+fe5WLvjOCxca5SI6Lo8thD95ptvAADvvPMOhgwZ4vAliiJSU1NRWlqKsWPHYuvWrVi+fDmioqIAAFFRUVi2bBm2bt2KcePGobi4GKmpqZDJ7OkYPXo0pkyZgrlz5+Kpp55CbGwsZsyY4bZrJaKWL9hfgzefG4SuVyzv9N/MC3jn8wzojWY3RkZE1HJ57GSl1157Da+99tp12zt27Ij169dft33YsGEYNmzYddsnT56MyZMn31KMRORd/LQqTH8sAX//dzb2/Wp/TvR4Xjn+uvYgpv2+LyJDeFeFiOhKHjsiSkTUkmg1SsjlMoSFtsH/TrgN40f2kpZ3ulhuxIJ16ThZUIXAQN8bfvn7a278QURErQgLUSIiF5AJAmwioDeaUV1jwb23d8S0R+KhVsoBAIZaC978+wH8besRlOtroTear/kSZAJkMsHNV0JE1Hw89tY8EVFLozeY8OWu4w77bo8Jx89HC2E02d9J/6+9p7EvKx+JPULh6+P4Izh5RE/oNI4v4yAias04IkpE1IT8tSoMjY9EWODlW+5lVbXYnXkB+SWGG5xJRNT6sRAlImpiaqUcA6LD0LtjYN1r7WG22HAg+yKyfivhEk9E5LVYiBIRNQNBENAtyh+D+0RAo5ZL+0/lV+G/mfkoq7qF16QSEXkoFqJERM0oyM8Hw+IiERF0+Va93mjGj4fz8dUPuRwdJSKvwkKUiKiZqZRy9O8VhriuwZBfmiUvAtj8w0nMTP0J+SXV7g2QiKiZsBAlInIDQRDQMaIN7oyPRFAbtbQ/91wF5qz5GV/99zeYzFY3RkhE1PRYiBIRuZFWo8TgPhGI7hggjY5arCK27z2Nv3y4H2lHC2ATRTdHSUTUNFiIEhG5mSAI6B4VgHmTb0f39gHS/uKKGqz++ijmfXwA6ceLYLOxICWi1oUL2hMRtRC9Owfj7RfvwDdpp7FuRzb0BjMAIO+iHis2ZyEsUIP7BnXC8NvaQ+erks4LDPR16MdmE1FRYWzW2ImIGoOFKBFRCyETBEAQMKhvJOK6h+Jfe09jR9oZ6VnRi2VG/P2fx/CPnScwJC4S9wzogKiwNhCvuHXfRqvirS4i8hgsRImIWpCrXxN6Z3xb/HahCmcLq2Cy2Jd2Mpmt+P5gHr4/mIewQF90CNMiItgXMkHga0KJyKOwECUiasF8VAr07hSInu39cb64Gr/lV6Ky2iy1Xywz4GKZAT4qOTpGtEGFvpaFKBF5DBaiREQeQC6XoUN4G7QP06G0sha/5VeioNSAurvyNSYrjp8tx5+W7Mad/aIwLC4S7UK07g2aiKgeLESJiDyIIAgI9vdBsL8PBLkMuecqkJNXBpPZftveYhXx7YE8fHsgD327BmNkUgf07BAAQRDq6ZmIqPmxECUi8lC+Pkr07RaCTuE65JdU49SFSpTpTVL74ZMlOHyyBB0j2mBkUgfc1isUchmnMhFRy8FClIjIw8llAqJCdWgXokV8r3Ds3H8GP/9aiLq59GcKqvDB179i4w8+uDMhEoP7tEWATn3DPomImgMLUSKiVkIQBCT0DEP/3hE4W1CJbT+ewn8O5kmz7Usqa7Bp92/YvOcUEnqEYmBsW/TvHQ4/reqavrgWKRE1BxaiREStiEwQYBMBf50a40f2woN3dMG3B87i25/PourSAvk2m4j07ItIz74ImSCgSzt/RHcKRHSnIHRvH4DQIF+uRUpEzYKFKBFRK3P1WqQAMLRvW1woMeBsoR4llTXSfpsoIvdcOXLPlWPbj6cgCEDXdv72Z0/DdOgW5Q8fFf9XQURNgz9diIi8gFwuQ/swHdqH6VBdY0Z+iQH5JQaUVdU6HCeKQO65CuSeqwBgH2Ht1LYNenYIQO9OQejZPgAKOcdLicg1WIgSEXkZrY8S3dr5o1s7f9SarSiprEFJhf2r0mB2ONYmivjtQiV+u1CJf6edhY9KjpjOQYjrGoK+XYOdPl9KRNRQLESJiLyYWilHZLAWkcH2xe9rzVb07BSMk+fLcTinGHkX9Q7H15isSD9ehPTjRRAAdI70Q1zXYMR1C0H7MB3XKyWim8JClIiIJGqlHHcmRuHupA6wWm2orDbh2OlSHM4tRnr2RRSWGqRjRUAaLd285xSC/X1wW3Q4EnqEomfHQPj5qjjznohuiIUoERE5qJt5rzeaIZMJiOkSjJguwXh0RA9cKK5GxokiHDpRhJyz5bDVvWMUQElFDb5JO4Nv0s4AAMKDfNEpog26RPqhS1s/RIXpoFbK3XVZRNQCsRAlIqJrOJt5f6WeUf7oHK7DxXIjCkuNuFhmhNlqczimsNSAwlID9h8tBAAIAhAR5Iv2YTp0CG+DDmE6tA9vA38+Z0rktViIEhFRo6iUckSF6hAVqoNNFFFWWYvCMgNKKmuhN5ivKUxFEdJs/Z+PXZT2+2tV9hn94fZZ/eGBvggN0ECnUTb3JRFRM2Mheh0mkwlvvPEGduzYAZVKhSeffBLPPvusu8MiImqRZIKAYH8fBPv7AACeuK83zl3UI/t0CY6fKcPJ85W4UKzHFXfyJRXVJlScKsWRU6UO+7UaJSKCfBHYRg1/rQp+vkr4aVXw81WhjVYFrY8CvmoFNGoFlAoZJ0oReSAWotexaNEiHDp0CB9//DEKCgowffp0REZGYvTo0e4OjYioxVMr5egaFYCwQA2GJkQBAGpMFpy7qMfZgiqcLajCmYIqnC2sgslsddpHtdGMk+crGvR5CrkMWo0Cvj5K6DRKaH2U0Pgo7MWqjwJyAL4+SrTxVaKNrwptfJXw81VBp1FCJmMBS+QuLESdMBgM2LBhA1atWoXY2FjExsbimWeewfr161mIEhE10I2eM9Wq5ejdMQDRHfxRXWNBRbUJldUmVBlMqK6xwFBjgdXmZPj0OixWGyr0JlToTTcVowD7yGsbX3vxqlbJ4aOU2/9UyaFSyqFSyKBSyqFUyOx/V1z6u/LS3y/9qVLIoFTIpPO48D9R/ViIOpGdnQ2TyYTExERpX2JiIlJTU2GxWKBQMG1ERK4gCAJ0GvsoZrsQrbRfFEWMuasHTudXYvt/T6LWbL30ZUOt2QqT2QqzxQaz1Qazxeb0ln9DiLCvDqA3mus99mbVFaU+agV8VAr4qORQyAT4qOQOBa9aKYdcJkAmEyCXySCTCZAJkPbZ9wuQCfa/C4IAAfbJX4IgSH/aB3btf167H7hYZYJMEFBRYbiiHRBg/7vs0j4Il/uQ8nSdBF+5W3Sy0+EsZ8de4VL4l/60X+Olv176U7j+MdJ+ATf9hMal/6eXVtY0+t9RHdHplTX45EYTBAFBfmqPfDxFEK/3r8uLffPNN/i///s/7N+/X9p38uRJ3HfffdizZw/CwsLcGN3NE0XxpkYWriS/9BOsseezD/bhCX20hBjYxy30IdrfAGWz2X/W2WwirJe2xSv2W20iLFYbrFb7n3V/J2oNfNQKdI3yh8zDilEO7TlhNBqhUjkuJ1K3bTLd3G2flkAQBCjkt/YP81bPZx/swxP6aAkxsI+m6YOIWiY+wOKEWq2+puCs29ZoNO4IiYiIiKjVYSHqRHh4OCorKx2K0aKiIqhUKvj7+7sxMiIiIqLWg4WoE9HR0VAqlTh06JC0Lz09HTExMZyoREREROQiLESd0Gg0ePjhhzFv3jwcPnwY3333HdasWYM//OEP7g6NiIiIqNXgrPnrMBqNmDt3Lnbu3AmtVotJkyZh0qRJ7g6LiIiIqNVgIUpEREREbsFb80RERETkFixEiYiIiMgtWIgSERERkVuwECUiIiIit2AhSkRERERuwUKUiIiIiNyChSgRERERuQULUSIiIiJyCxaiREREROQWLESJiIiIyC1YiBIRERGRW7AQJSIiIiK3YCFKRERERG7BQpSa3NmzZ/Hcc8+hf//+GDp0KBYuXIja2loAwPnz5zFp0iTEx8dj1KhR2L17t5ujbX6zZs3ChAkTpG1vz4nZbEZKSgoGDBiAAQMGYM6cOTCZTAC8OzcVFRV49dVXkZSUhDvuuAOLFy+G1WoF4J15MZlMuP/++7F3715pX315SEtLwwMPPIC4uDhMmDABZ86cae6wm5yzvOzbtw/jxo1DQkIC7r33Xnz55ZcO53hrXq5sGz16NJYtW+aw31vzUlhYiKlTpyI+Ph533nknPv30U4dzXJ0XFqLUpEwmE5577jmoVCp8/vnnWLx4Mb799lssWbIEoihi6tSpCAgIwMaNGzFmzBhMmzYNeXl57g672ezbtw8bN26UtpkTYNGiRdi1axdSU1OxcuVK7NmzBytWrPD63MybNw+FhYVYv3493n77bWzZsgUff/yxV+altrYWf/7zn5GTkyPtqy8P+fn5mDJlCh588EFs2rQJISEhmDp1Kmw2m7suw+Wc5eX06dP44x//iBEjRmDLli14/vnnMX/+fHz//fcAvDcvV1q5ciVyc3Md9nlrXmw2G6ZMmYLa2lps2rQJr776KlJSUvDTTz8BaKK8iERN6MCBA2JMTIyo1+ulfV9//bU4aNAgce/evWKfPn3EqqoqqW3ixIniu+++645Qm111dbU4fPhw8dFHHxXHjx8viqLo9TmpqKgQY2JixB9//FHat2nTJvHpp5/2+tz069dP3LVrl7SdkpLilXnJyckRH3zwQfGBBx4Qe/ToIf7000+iKNb/vbN06VLx0UcfldoMBoOYkJAgne/prpeXFStWiI888ojDsX/5y1/EP/3pT6Ioem9e6hw7dkwcPHiwOHLkSPH999+X9ntrXn744QcxISFBLCsrk46dPXu2uGzZMlEUmyYvHBGlJtWlSxesXr0aWq1W2icIAkwmEzIzM9G7d2/odDqpLTExERkZGW6ItPktWbIESUlJSEpKkvZ5e07S09Ph4+ODQYMGSfvGjh2LDz/80OtzExAQgK+//hpGoxGFhYXYs2cPYmJivC4vBw8exODBg/HFF1847K8vD5mZmejfv7/UptFoEBMTg0OHDjVL3E3tenkZNWoUZs+e7bBPEATp8ShvzQsAWK1WvP7663j11VcREBDg0OateUlLS8OAAQMc8jF//ny88MILAJomL4pGn0nUAEFBQQ5Fhc1mw/r165GYmIiioiKEhYU5HB8cHIyCgoLmDrPZHTp0CDt27MD27duxZs0aab835wSwP0/crl07bN++HatWrYLBYMDIkSPx8ssve31u5syZg+nTp6Nfv36w2Wy4/fbb8eKLLyIlJcWr8vLoo4863V/fv4/rtRcWFjZNoM3sennp3Lmzw3ZxcTH++c9/SoWFt+YFAD766CMEBgbi4YcfvqYg89a8nD17FpGRkViyZAm2bNkCnU6HJ598EsnJyQCaJi8sRKlZpaSk4NixY9i4cSM+/vhjKJVKh3aVSgWz2eym6JqHyWTCrFmz8Prrr8Pf39+hzWg0emVO6lRXV+PcuXNYv3495s2bh+rqasybNw8Wi8Xrc3P27Fn07t0bzz//PPR6Pd544w289dZbXp+XOvXlwWg0QqVSXdNeNxHOGxgMBrzwwgsICwuTChFvzcupU6fw0UcfYdOmTU7bvTUv1dXV2Lp1K+655x6sWLECR48exfz58xEYGIi77767SfLCQpSahSiKePPNN/HZZ5/hvffeQ/fu3aFWq6HX6x2OM5lM8PHxcVOUzWPFihXo2LEjRo0adU2bt+akjkKhgF6vx9tvv40OHToAAKZPn47p06djzJgxXpubs2fPYsGCBfj+++8REREBwP5vZdKkSUhOTvbavFypvu8dtVp9zf8sTSbTNbdkW6uqqir88Y9/xLlz5/CPf/wDGo0GgHfmRRRFzJo1C1OmTEFUVJTTY7wxLwAgl8vh5+eHN954A3K5HLGxscjOzsZnn32Gu+++u0nywkKUmpzNZsOsWbOwbds2LFmyBHfffTcAIDw8HNnZ2Q7HFhcXIzQ01B1hNptt27ahqKgICQkJAOzLFVmtViQkJOCPf/yjV+akTlhYGBQKhVSEAvZbi7W1tQgNDcWJEyccjveW3Bw5cgRarVYqQgEgNjYWVqvVq/Nypfp+noSHh6OoqOia9u7duzdbjO5SWlqKp59+GsXFxfjkk08cvr+8MS8XLlxAeno6jh49ivfeew8AUFNTg6ysLGRmZuLDDz/0yrwA9p/BNpsNcrlc2te5c2fs27cPQNP8e+FkJWpyCxcuxLZt27Bs2TLcc8890v64uDhkZ2fDYDBI+9LT0xEfH++GKJvPunXrsH37dmzZsgVbtmxBcnIyYmNjsWXLFq/NSZ34+HhYLBYcP35c2nfy5ElotVrEx8d7bW7CwsJQWVmJ/Px8ad/JkycB2CcEemterlTf905cXBx++eUXqc1oNOLo0aOtPk91S+iVlZXh008/RZcuXRzavTEv4eHh2LlzJ7Zu3Sr9HI6Ojsajjz6KN998E4B35gUAEhIScOLECYdHe3Jzc9GuXTsATZMXFqLUpDIyMrB27VpMmzYNsbGxKCoqkr6SkpIQGRmJGTNmICcnB6tXr0ZmZqb0UHRr1a5dO3Ts2FH68vPzg4+PDzp27Oi1OanTqVMnDB8+HDNnzsSRI0dw8OBBLF68GI888ggGDhzotbmJj49HdHQ0Zs6ciezsbGRkZGD27Nl46KGHcO+993ptXq5U3/fOuHHjkJmZKa0ZOWvWLERGRmLgwIFujrxp/f3vf8evv/6KlJQUaDQa6edveXk5AO/Mi0KhcPgZ3LFjR6jVavj7+yM8PByAd+YFAO677z4oFAr85S9/walTp7B161Z89dVXePzxxwE0UV4avfATUQMsXLhQ7NGjh9Mvs9ksnj59WnziiSfE2NhY8b777hP37Nnj7pCb3bvvviutIyqKotfnpKqqSpwxY4bYr18/MSkpSVywYIFoMplEUfTu3BQUFIjTpk0Tk5KSxMGDB4tvvPGGaDQaRVH03rxcvS5kfXn44YcfxHvvvVfs27evOGHCBPHMmTPNHXKzuDIvY8aMcfrz98q1IL0xL1d79NFHHdYRFUXvzcvJkyfFiRMnirGxseLvfvc7ccOGDQ7HuzovgiiKoosKaSIiIiKiBuOteSIiIiJyCxaiREREROQWLESJiIiIyC1YiBIRERGRW7AQJSIiIiK3YCFKRERERG7BQpSIiIiI3IKFKBERERG5BQtRIiIiInILFqJERERE5BYsRImImtldd92F+fPnuzuMZnPu3Dn07NkTO3bscHcoRNTCsBAlIiIiIrdgIUpEREREbsFClIi8Us+ePfH5559jypQpiIuLw1133YX169dL7de7nfzQQw9hxowZAID9+/dL/QwZMgTDhg3DuXPnAABffPEFRo8ejb59+2LkyJHYsGGDQz81NTWYO3cukpKSkJiYiNdeew16vV5q1+v1+Otf/4rf/e53iI2Nxe23347XXnsNlZWV0jGZmZl44oknkJCQgKSkJEybNg3nz593+JxPPvkE99xzD2JjYzF69Gj861//uqk8TZgwAU8//bTDPpvNhsGDB+O9994DAFy8eBEzZ87EkCFDEBMTgyFDhuDNN9+EyWRy2ueyZcuQkJDgsO/YsWPo2bMn9u/fL+07cuQIJk6ciLi4ONx+++144403YDQabyp+ImrZFO4OgIjIXRYvXoxhw4Zh2bJl+Omnn/DGG29ApVLhkUceual+UlNTMX/+fFRWViIqKgoff/wx3nrrLTz55JMYOnQofv75Z8yePRu+vr64//77AQCbN2/GyJEjsXTpUpw4cQKLFi1CYGCgVOS+8soryMnJwSuvvILQ0FBkZmbivffek44xGo2YPHkyBg8ejBdeeAGVlZV4++238ec//xlffPEFAGD58uVYuXIlnn32Wdx2223YvXs3/vznP0MQBIwaNapB13b//fdj/vz5KCsrQ2BgIAB7AV5cXIz7778fNpsNzzzzDARBwJw5c6DT6fDjjz/iww8/RIcOHTBhwoSbymWd3NxcjB8/HvHx8Vi6dClKSkrwzjvv4Ny5c/jggw8a1ScRtTwsRInIa3Xp0gXvvPMOAGDo0KHIz8/HqlWrbroQnThxIu666y4A9tHCVatWYezYsVJROWjQIOTl5SE9PV0qRDt37ox3330XgiBg0KBBSEtLk0YDa2trYTabMXfuXAwdOhQAMGDAABw6dAg///wzACAnJwfl5eWYMGGCNLoYGBiItLQ02Gw26PV6rF69Gs888wz+9Kc/AQCGDBmC6upqvPPOOw0uREeOHIk33ngD3377LZKTkwEA//73v9GrVy907doV+fn58Pf3x6xZs9CrVy8AwMCBA7Fnzx4cOHCg0YVoamoqgoODsXr1aqhUKgBAp06d8MQTT+DAgQPo379/o/olopaFhSgRea377rvPYXv48OH45ptvUFBQcFP9dOvWTfr7qVOnUF5eLhWmdeoK3jpxcXEQBEHajoqKQk5ODgBArVZjzZo1AOyPCJw+fRo5OTk4efIk1Go1AHsRHRAQgOeeew6jR4/GsGHDMHDgQCQlJQEAMjIyUFtbizvvvBMWi0X6nKFDh2LTpk3Iy8tD+/bt6702f39/DBkyBDt27EBycjKsVit27dqFSZMmAQDatm2LdevWwWaz4fTp0zh9+jSys7NRUlKCyMjIBufwavv378fw4cMhk8mk+OPj46HT6bBv3z4WokStBAtRIvJaYWFhDttBQUEAgPLycuh0ugb3U3de3blX73NGo9E4bAuCAFEUpe3vvvsOKSkpyMvLQ2BgIGJjY+Hj4wObzQYA0Ol0WL9+PVasWIHNmzfj008/hZ+fH15++WU8/vjjUhyPPvqo088vKipqUCEKAA888ACmT5+O8vJyHD16FGVlZRg9erTU/uWXX2Lp0qUoLi5GaGgo4uLioFarHa7nZpWXl+OLL76QHjO4OnYiah1YiBKR1yorK3PYLikpAWAvIs1mMwBIhV8dg8Fwwz7btGkDACgtLXXYf+rUKZSVlaFfv371xnX69Gm89NJLGDNmDNavX4+IiAgAwEsvvYSTJ09Kx3Xv3h1Lly6FyWRCeno61q5di3nz5iEmJkaKY8WKFQgPD7/mMzp37lxvHHXuuusuqFQqfP/998jIyEBCQoI02ln3/OvUqVMxfvx4qQD//e9/f93+BEG4Jq/V1dUO2zqdDsOHD8djjz12zfl1z6oSkefjrHki8lo//PCDw/Z3332HLl26ICwsTBoRvXjxotReWFgozYq/nrpb5lf3/d5772HRokUNiuvo0aMwm82YPHmyVIQaDAakp6dLo4z//e9/MXDgQJSWlkKlUmHgwIGYPXs2AODChQuIi4uDUqlESUkJ+vTpI33l5ORgxYoVDYqjjkajwV133YX//Oc/+Pbbb6XnXAH7IwCCIGDKlClSEVpYWIgTJ05cd0RUp9OhpqbGYQWA9PR0h2MSExPx22+/ITY2Voq9bdu2eOedd6RHGIjI83FElIi81p49ezB//nzcdddd+OGHH7Br1y4sXboUgP3ZyLi4OKxZswZt27aFXC7H8uXL4efnd8M+FQoF/vjHP+Ltt99GYGAgBg4ciAMHDmDHjh1Yvnx5g+KKjo6GXC7H22+/jcceewxlZWVYs2YNiouLpYk7ffv2hSiKeOGFF/Dss89CqVRi7dq18PPzw4ABAxAUFIQJEyZg4cKFqKioQN++fZGdnY0lS5Zg+PDhN/XoAWC/PT916tRrZtz36dMHNpsNCxYswMiRI5Gfn4+VK1fCZDJdd6mlO+64AykpKZg1axaeeOIJZGdn4x//+IfDMVOnTsWjjz6Kl156CePGjYPJZEJqairy8/PRu3fvm4qdiFouFqJE5LWeeeYZHDt2DFOnTkWHDh2wZMkSjBw5UmpPSUnB3Llz8eqrryI0NBSTJ0/G3r176+130qRJUKvVWLt2Lf7+97+jU6dOePfdd3H33Xc3KK7OnTvjrbfewvLlyzF58mSEhoZi6NChGDduHObPn4/CwkKEh4fjww8/xDvvvIPp06fDbDajb9+++Pjjj6WRyf/93/9FUFAQNmzYgPfffx9hYWGYOHEiXnjhhZvO1ZAhQ+Dn54eYmBiH518HDhyImTNnYu3atdi0aRMiIiIwatQoKBQKrF271ulaol27dsVf//pXaWmpuLg4vP/++w6rFcTGxmLt2rVYunQppk2bBrVajX79+mHRokVOHzUgIs8kiLfyNDkRkYfq2bMnpk+ffs1i7URE1Hw4IkpE5IVEUYTVaq33OIWC/5sgoqbDnzBERF5o8+bNmDlzZr3HHT9+vBmiISJvxVvzREReqKysrN4VAAD7ZCQioqbCQpSIiIiI3ILriBIRERGRW7AQJSIiIiK3YCFKRERERG7BQpSIiIiI3IKFKBERERG5BQtRIiIiInILFqJERERE5BYsRImIiIjILViIEhEREZFbsBAlIiIiIrdgIUpEREREbsFClIiIiIjc4v8Bp26diELMwEwAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 720x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "fig, ax = plt.subplots(figsize=(5, 2))\n",
    "sns.histplot(data=df_train, x='purchase_value', kde=True, bins=30, ax=ax);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a81640cb",
   "metadata": {},
   "source": [
    "Age is more normally distributed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "ab89b72c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAApMAAAEuCAYAAADSqpnLAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAABYlAAAWJQFJUiTwAABOA0lEQVR4nO3deXxU1d0/8M+dmcySyZ5MNgJhCYFAYhIiIGqhFlARF5ZilWJVLFTQ4tPH/hDksYBWoS4FBaLlqfgotCpCBcFWcUHEJQqBhLAEQggkhCyTPbNk1vv7Y+DCGNbJJDOTfN6v17zIuefeM997mCTf3HvPOYIoiiKIiIiIiDwg83UARERERBS4mEwSERERkceYTBIRERGRx5hMEhEREZHHmEwSERERkceYTBIRERGRx5hMEhEREZHHmEwSERERkceYTBIRERGRx5hMEhEREZHHmEwSERERkceYTBIRERGRx5hMEhEREZHHmEwSERERkccUvg6gu3M6RdjtDl+HcVlKpetjYLXafRxJ98T+7Tzs287F/u087NvOxf69dgqFHDKZ4NmxXo6FfsJud6C52ezrMC5LpwsFAL+PM1CxfzsP+7ZzsX87D/u2c7F/r114uEZKwq8Vb3MTERERkceYTBIRERGRx5hMEhEREZHHmEwSERERkceYTBIRERGRx5hMEhEREZHHmEwSERERkcc4zySRHwoP13g8eeyFnE6R86wREVGnYjJJ5IdkMgGCTECr0epxG6FaJW89EBFRp2MySeSnWo1WfPDZUY+PnzZ+EEI0QV6MiIiIqD1euCAiIiIij/HKJJGXXep5x8jI4KtuQy6XQRA6/swkERFRZ2MySeRlP33eURDsAABRFK+6jfAQFSxWB5qNVoiiCEEQECSXQaOSM8kkIiK/4tfJZHNzM5577jl8/fXXUKlUuOeee/CHP/wBcrkclZWVeOaZZ7Bv3z4kJCRgwYIFGDNmjHRsXl4enn/+eZSXl+O6667Dn//8ZyQnJ0v169evx//+7/+itbUVt99+O5555hkEB1/9lSOiy7nweUetVgUAMBotlz1GFEXUN7fhdJ0RIoDTtQb8NP9UKmSICFUhNkKDJJ0WyiB5Z4RPRER01fz6mcmlS5eipqYGGzZswEsvvYQtW7bgrbfegiiKmDt3LiIiIrBp0yZMnjwZ8+bNQ0VFBQCgqqoKc+bMwd13343NmzcjJiYGc+fOhdPpBADs2LEDK1euxOLFi/HOO++gqKgIy5cv9+WpUg9msztxrKIJn+89je8O1aC8xoCKmvaJJABY7U7UNppxsKwBO/ZUYG9xLRpbL5+kEhERdSa/TiZ37dqFBx98EKmpqbjhhhtw5513Ii8vD3l5eSgrK8Ozzz6LlJQUzJ49G9nZ2di0aRMAYOPGjRg8eDBmzZqFlJQUvPDCC6iqqkJeXh4A4O2338aMGTMwduxYZGRkYMmSJfjwww9hNBp9ebrUw9gdThytaMLn+adRXN4Es9XhVi8IgFatQJhWidDgIATJ3b9dnSJwpt6E3Qeq8MPhGjQZmFQSEVHX8+vb3BEREfjoo49w0003oaWlBbt378a4ceNQWFiIIUOGICQkRNo3JycHe/fuBQAUFhZi+PDhUp1Go8HQoUOxf/9+jBw5EkVFRZgzZ45Un5WVBYfDgSNHjuD666/vuhOkHkkURZTXGlB8qgkWm3sCqVTI0EunxW8nZSAuKhgffXXc7Thjmx11TWZU1BrQaDg/B2VNoxk1jWYk6bRIS46ERuXX39pERNSN+PVvnMWLF2P+/PkYNmwYnE4nbrjhBvz+97/HsmXLEBsb67ZvdHQ0qqurAQB6vf6i9TU1NWhpaYHFYnGrVygUiIiIkI73JqVSAZ0u1OvtdoZAiTMQCIJdelbyHK1Whao6IwqO6dtdRQzRBGFo/2gkJ4RBLhMwtH80DCZbuzZCQoC4mBAMTdGhqdWCQ2X1KK9ulepP642oqjchrW8ULDYHIkLVPeL/tSecoy+xfzsP+7ZzsX+7hl8nk+Xl5RgyZAgee+wxGAwGPPfcc/jLX/4Cs9mMoCD3yZiVSiVsNhsAwGw2Q6lUtqu3Wq1oa2uTyherJ3I4RRhMnn8WwrTKdtsaW9tQcEyP6nqT23aNSoHrUmLQNyHsmpdPjAhV4abrEpHe34LCkjpU6g1S/AdP1GP+qm/wmzvScEtOb68szUhERHQxfptMlpeX44UXXsCXX36J+Ph4AIBKpcLMmTMxbdo0GAwGt/2tVivUarW0308TQ6vVioiICKhUKql8qeO9yWq1+/3ayOf+ctPrW6+wZ/cXGRkMg9nWoZVnZt6TAcA1ettsseP4mRaUnWlx20cuEzCgVxhSeoVDIZfBbP5J8np28M2VRoADgEIAclJj0CdWi0MnG9FydkqihpY2rHxvP7bsKsX02wYhrW/UNZ9LqFYJ0SmisdF05Z19gJ/dzsX+7Tzs287F/r124eEaKJWepYV+m0wePHgQWq1WSiQBID09HQ6HAzqdDseOHXPbv66uDjqdDgAQFxcHvV7frn7gwIFSQllXV4fU1FQAgN1uR1NTU7tb40Seqqhpxf4SPU7rje1GZSfHhWBQnwioPfymvRRdhAZjMtWoqDXgyAXPY56sasEL/7cH8VHBGNI38pqWWOSSjEREdCV+O5o7NjYWLS0tqKqqkraVlpYCAPr374/i4mKYTOevluTn5yMrKwsAkJmZiX379kl1ZrMZhw8fRlZWFmQyGTIyMpCfny/VFxQUQC6XIy0trZPPirozu8OJ8ppWPP36t1iY+y0qat0TybhIDX6enYjMlBivJ5LnCIKAPnGhGJvTC/eOHYggxflv8eoGE3bur8TBsga0/WTkOBERkaf8NpnMyspCWloaFi5ciOLiYhQUFOCZZ57BPffcg9tuuw2JiYlYsGABSkpKsHbtWhQWFmLatGkAgKlTp6KwsBCvv/46jh8/jkWLFiExMRGjRo0CAEyfPh3r1q3Djh07UFRUhKVLl2Lq1KnQarW+PGUKQDa7AxW1Bvx4pBaf/FiBguP1OFzW4LaPLkKDX1zfGyOHxCEsuP3zlJ1BIZdh+m2D8fK8nyFJd/5zLYrAiTMt+CL/9Nmk0t4l8RARUfflt7e5FQoF/va3v+GFF17Agw8+iKCgINx+++344x//CLlcjtzcXCxatAhTpkxBnz59sHr1aiQlJQEAkpKSsGrVKixbtgxvvPEGMjMzkZubC5nMlTtPnDgRlZWVWLJkCaxWK8aPH48FCxb48nQpgFhtDlQ3mHCm3gR9k/mik4vLZALiIjUY0CsMvePDAVzd84/eFh2uwbBUHfolhOFQWQMazk5w7nCKOHGmBSerW5EcF4KBvcKh5nRCRETkAb/+7REXF4dXX331onXJycnYsGHDJY8dM2aM2/KKPzV79mzMnj27wzFSzyCKIqrqTThV04q65raLJpAAEBochKm3DMT1aXHY8X1Z1wZ5GZGhKtyUEY/qBhOOVTSj+ewgHadTRFlVK05Vt6JPXCgGJoVzjkoiIrom/K1BdBlOp4hTNa0oPdMCU9vFbwlHhCiREB2MhGgtQjRBmDRmAAxmWxdHemWCICAhWov4qGDUNJpxrKIJTWcnPneKwMnqVpTXtKJ3XCgG9gpHsJo/HoiI6Mr424LoEuqb23DgRD1aTe0Tw8gQJRJitEiMDkawOrBGOwuCgPioYMRFalDbaMbRnySVp84llbEhGH9DX47mJiKiy2IySfQTVpsDhcfrcKrGfS7TIIUM/eJD0Sc+FMHd4FawIAiIiwpGbKQG+qY2HK1oQuPZZypFESivMeCp1d9gxoTBGJGq48TnRER0UYH/G5HIi6rqjVj+9l6UX5BIymUCUntHoF9CKBRyv50AwWOCICA2UgNdhBp1za6ksqHFlVSaLHas3XIQX/YKx+/uHorocO9P7E9ERIGNySTRWYdPNiB3y0G3ZyMTooOR3i+qRwxKEQQBuggNYsLV0De1oehEPYxn++J4ZTOWvPUjfnvnEGSmxPg4UiIi8ifd/zck0VXIP1qLv310CHaHa5i2TADS+0cjOS4EgtCzbu+eu1L58+xEBAersHnncTidIoxtdry66QCm3zoIv/xFyjX1i9Mp+v2yokRE5Bkmk9Tj7T5wBv/3n2Jpup/IUBXS+0UhMlTl28B8TC6T4de3DcawQbF4cf1e6XnKf+44Cn2TGTNuH3xVz1GGapX+uzoCERF1GJNJ6tG+LarCW/8ulsoJMVrMn5GDnXvKfRiVf+kdF4oRabHIP6pHXXMbAOCzH8tx+EQ9slNjILvCFUqu701E1L3xggH1WHuLa7Hu30ekcp/YELww50bERGh8GJV/UgXJMXJIHBKjg6VtlXVGFJTUQbzUDO5ERNQj8MokdRvh4Zqrnr5m/9FarN12SLq13TchFM/OHoWIUDVazq4OQ+7kMgE5g3RQnmjAyepWAMBpvRFBChnS+0X1uGdLiYjIhckkdRsymQBBJqD1Csng6VoDXtyQLw22SYjW4o+/zmEydBUEQUBG/yg4RVGaPqmsqhVKhRyD+kT4NjgiIvIJJpPUrbQarfjgs6OXrLfYHNhdWIU2qwMAoFHKkd4vEju+PwkAmHlPRleEGdAEQUDmgGjY7U6cqTcBAI5WNCE0OAiJMVofR0dERF2Nz0xSj+F0ithbXAuTxTV3olwmYMSQuB4xh6S3CYKAYak6xFwwifn+kjo0Gyw+jIqIiHyByST1CKIo4kBpPepbzic7w1JjEK5V+jCqwCaTCbh+kA5atSsZdzhF/FhcC8vZq75ERNQzMJmkHuHEmRaU155fIjEtORIJ0bwl21HKIDlGpMVCIXc9b2q2OLDnaC2cTo7wJiLqKZhMUrdX02DCoZONUjlJp0VKrzAfRtS9hAYrkZOqk8oNLRYUnajnlEFERD0Ek0nq1lqMVuQf00vlyFAVMlNiOHLby+KigpGWHCmVT9UYpOmDiIioe2MySd2WxebAj0dqpSmANCo5RgyOhfwq56Kka5PSKwy9LhjNfbCsQVoxh4iIui8mk9QtOZwi9vx05HZaHFRKuY8j674EQUBWSjQiQlyDmkQR2Hu0FnVNZh9HRkREnYnJJHU750ZuN1wwcjsnVceR211ALpdh+OBYKINcP1qsNidee78AFhtHeBMRdVdMJqnbKT3TgooLRm4PSY5E/AVrSlPn0qgUGD4oFuceSy2rasEb/zrAATlERN0Uk0nqVvYfrcXhC0Zu947VYgBHbne56HA10vtFSeWv9lXi872nfRgRERF1FiaT1G2cqmpB7uYDUjkqVIXrBnDktq/0jQ9Fn9gQqfz+l8dx5GSDDyMiIqLOwGSSuoUWoxUvvL3n/JrbKjmGc+S2TwmCgIwB0RjQKxwA4BRF5G45iOoGk48jIyIib2IySQHPZndi9YdFqG10jRqWywSM5MhtvyCXCZj3qyxEhqoAAMY2O1ZuLESLyerjyIiIyFuYTFJAE0UR73xSjOOnmwEAAoCcQTqEceS234gKU2PBg9dDqXD9uKltMmPV5gOwcoQ3EVG3wGSSAtonP5bj24PVUvlX41MRH8WR2/4mtXckZt89FOceOiitbMHfPz4CJ0d4ExEFPIWvAyDy1P4SPTbtLJXKv7i+N+64sS82fX7Mh1HRT2k1QZDLZRg7MhkmmwPrth0GAOwtrsXHcaH4zR1pV9WO0ymiuZkToBMR+RsmkxSQTla3YO1Hh3HuulZqUjgenZwOi83p07ioPZkgwCkCBrMNt+T0xulaA3b8UA4A+HBXKcJDVBg7vPdl2wjVKnkbhYjITzGZpIBT12zGqx8ckFZViQlXY+6UDAQp5Ewm/ZTBZMUHnx0FAKjkAuKjNKhucF1l/L+PD+PwibrLPp4wbfwghGiCuiRWIiK6NvxjnwKKqc2GlR8cQLPRNRo4WKXAf03LRFgwB9wECkEQMCxVJ63hDQD5R/VobLVc5igiIvJXvDJJfiE8XAPZFeaEtNkd+OvGQpypMwIAFHIZFj40HEP7RwNwrQvNCcoDg0Iuw4i0OHxzoAomix0Op4gfDtfg5usSeAWSiCjA8Mok+QWZTIAgE2Aw2y76ajVZ8er7BTh4ol46ZtY9Q9E3IUzahwKLWinHDUPjpCmDrHYn8g7XSI8vEBFRYOCVSfIbrcbzz9X9VPGpRhw7O5ckAAzuE4HKmla3/Wfek9HpMZJ3hWiCMCItFt8dqoHTKcLUZscPh2twY3o8FHL+rUtEFAj405r8XnlNq1simRwXgoFJ4T6MiLwpKkyNnNQYqdxksCL/qB4i56AkIgoITCbJr9U2mlF4/Pyt7dgIDTIGRPPZyG4mIVqLjP5RUrmm0YzDJxt9GBEREV0tJpPkt5qNVuwprpXmkgzTKnH9IB1kTCS7pX4JYUjpFSaVS8+0oLym1YcRERHR1fDrZNJms2HZsmUYOXIkRo4cicWLF8NqdU0JU1lZiZkzZyIrKwsTJkzArl273I7Ny8vDXXfdhczMTDzwwAM4deqUW/369esxevRoZGdnY+HChTCZTF12XnRlZovr2TmH05VKqpVy3JAWC4XCrz+y1EFpyZGIj9JI5QOl9WhoafNhREREdCV+/Zv5xRdfxGeffYbc3Fy8/vrr2L17N9asWQNRFDF37lxERERg06ZNmDx5MubNm4eKigoAQFVVFebMmYO7774bmzdvRkxMDObOnQun0zWh9Y4dO7By5UosXrwY77zzDoqKirB8+XJfnipdwHZ2VG+b1TWqVyEXcMOQOKhVHC/W3QmCgGEDdQgNdk0P5BSBPcV6NBs4ByURkb/y22SypaUF7777Lp577jnk5ORg2LBhePzxx3Ho0CHk5eWhrKwMzz77LFJSUjB79mxkZ2dj06ZNAICNGzdi8ODBmDVrFlJSUvDCCy+gqqoKeXl5AIC3334bM2bMwNixY5GRkYElS5bgww8/hNFo9OUpE1zrL+89WotWk2uqH0EAhg+ORZiWk5L3FAqFDCPTYqUpgyw2B9ZsOgCHg6sbERH5I79NJvPz86FWq3HjjTdK26ZMmYK///3vKCwsxJAhQxASEiLV5eTkoKCgAABQWFiI4cOHS3UajQZDhw7F/v374XA4UFRU5FaflZUFh8OBI0eOdP6J0WUdLGuAvun8bc3MlBjoIjSXOYK6o2B1EIal6qTykZMNePezYz6MiIiILsVv7xuWl5ejV69e2L59O9544w2YTCbcfvvt+MMf/gC9Xo/Y2Fi3/aOjo1FdXQ0Al6yvqalBS0sLLBaLW71CoUBERIR0vDcplQrodKFeb7cz+DrOz38sxcnq8wMu0vtHI61f9NU3cHZcjlar8jyITmzjmtr083Ppijb6aVUwWOw4WOoazb9553Fcl6LDsMGx7fb19We3u2P/dh72bedi/3YNv00mjUYjTp8+jQ0bNmDp0qUwGo1YunQp7HY7zGYzgoLcl1xTKpWw2Vy3Rs1mM5RKZbt6q9WKtrY2qXyxevKN/Udrsf6TYqncJz4U6QOuIZGkbim9fzTqmsyorncNkFv53j6s+uMtCA/pQHJLRERe5bfJpEKhgMFgwEsvvYQ+ffoAAObPn4/58+dj8uTJMBgMbvtbrVao1WoAgEqlapcYWq1WREREQKVSSeVLHe9NVqsdzc1mr7frTef+ctPrfTMNS3WDCX9Znw/n2ZHbESFKZPSNhMl0jcn92TmEjMYODNbohDbOXZG7pjb99Fx80UZm/2iY2uxoMVrR2GrBKxv24vEpGRAEweef3e6O/dt52Ledi/177cLDNVAqPUsL/faZydjYWCgUCimRBIB+/frBYrFAp9NBr9e77V9XVwedzvWMVVxc3CXrzyWUdXV1Up3dbkdTU1O7W+PU+YxtNry66QCMZ9fWVivlGDE4FnIupUdnqZRyzLonXSrvL6nD7gNVPoyIiIgu5Le/sbOysmC323H06Pm1l0tLS6HVapGVlYXi4mK3uSHz8/ORlZUFAMjMzMS+ffukOrPZjMOHDyMrKwsymQwZGRnIz8+X6gsKCiCXy5GWltb5J0YSh9OJ17ccRE2D6/9RqZBhxOBYTgFE7WSl6nD7qGSp/O7nJdLnhoiIfMtvk8m+ffti7NixWLhwIQ4ePIi9e/fi5Zdfxr333otRo0YhMTERCxYsQElJCdauXYvCwkJMmzYNADB16lQUFhbi9ddfx/Hjx7Fo0SIkJiZi1KhRAIDp06dj3bp12LFjB4qKirB06VJMnToVWq3Wl6fc42z9psxtybzZkzMQEcpn4ejiHrpjCBKigwG4pgtau+0w7JwuiIjI5/w2mQRck5YPGjQIDz74IB577DGMHz8eTz75JORyOXJzc9HQ0IApU6Zg69atWL16NZKSkgAASUlJWLVqFbZu3YqpU6eirq4Oubm5kMlcpztx4kTMmTMHS5YswcMPP4z09HQsWLDAl6fa4xwqa8DH351flejesQMxcmi8DyMif6dSyjH7rqGQy1zDw8uqWvA+pwsiIvI5v76fGBISgmXLlmHZsmXt6pKTk7Fhw4ZLHjtmzBiMGTPmkvWzZ8/G7NmzvRInXZsmgwX/u+2QtOb2kL6RuHdcKswWu0/jIv+XHB+KyaP7Y9NXpQCAjZ8fRc7gWERrg65wJBERdRa/vjJJ3Y/TKWLtR4fQcnaFmzCtErMuuNpEdCW3j+iDQb0jALiWW/zru/tgObv0JhERdT0mk9Sltn93EsXlTQBcc1nPvmsIwrlUIl0DmUzAb+8cAo1KDgCoqjNi487jPo6KiKjnYjJJXab4VCO2flsmle+6qS+G9I3yYUQUqKLD1Zg+LlUq79xfiYMn6n0YERFRz8VkkrpEi9GKv207BPHsg5KD+0Tg7pv6+TYoCmg3psdjVEaCVF737yMwnJ2vlIiIuo5fD8Ch7sEpivjf7YfRbHCtaBMaHIRZdw2FjM9J0lXSaoIgl8sQGRnstv2xX2biSFkDmgwWNBms+GBXKf77/mGXbMfpFP1+RSoiokDDZJK8Ijxcc8nkcPPO4zhU1iCV/3BfNvr3iXTbRy6XQRCYXNLFyQQBThHtrjwKgoCH7xqKFe+6FinYXXAG16XE4Ib0hHZthGqVvBVDRNQJmEySV8hkAgSZgFaj+3raR0814p+fnl/F6M6b+2Fgn8h2SUF4CCcrp8szmKz44LOjbtvOrXveJzYE5bUGAMDaLQdx7GQDND9ZSWna+EEI0XAKISIib2MySV7TanT/ZW+1OfBVwRk4zz4oGRWqApzOdgkBAMy8J6PL4qTuZ2i/KNQ1t8FkscNmd2J/SR1GDY3j1W4ioi7Auz7UKURRxP6SOrSdnf8vSCFDziAdZPzlTp0gSCFD9sAYqVzX3IaS080+jIiIqOdgMkmdovRMC2oazw90yB4Y0+62I5E3RYerkdo7XCofLW9CfUubDyMiIuoZmEyS1zW0tuHIqUap3D8xDPFRwZc5gsg7UntHICrM9RylCGDfUT2sNq6OQ0TUmZhMkldZ7Q7kH9VL80lGhCgxJDny8gcReYlMEJCTqkOQwvWjzWx1oOB4PcRzH0giIvI6rySTW7ZsQWNj40Xr9Ho93nzzTW+8Dfk5URRRUFIPs8V1JUghF3D9IB3nk6QupVEpkJ1y/vnJ6gYTTla3+jAiIqLuzSvJ5MKFC1FRUXHRugMHDmDlypXeeBvyczt+KEd1g0kqZw+MQbCaU7FQ14uPDka/hFCpfKisAaeqWnwYERFR9+XxiIgHH3wQRUVFAFxXpB588MGLTsPR1taGoUOHeh4hBYSjpxrx7o7zU/70SwhFQrTWhxFRTzekbxQaWixoNlrhFIHXNhbgr0+M9nVYRETdjsfJ5DPPPINPPvkEoihizZo1mDhxIuLj4932kclkCAsLwx133NHhQMl/tRiteGlDPhxO13Np4VolhvSN8nFU1NPJZQJyBumwq+AMHE4RtY1mvLqxAL+7awinqCIi8iKPk8mUlBQ8/vjjAFxLmk2bNg1xcXFeC4wCg8PpxBtbD0pTsAQpZLh+sA5yPidJfiBEE4TsgTHYe1QPANhzuAbJsSG444ZkH0dGRNR9eGXiv3NJZXNzM8xmM5xOZ7t9EhMTvfFW5Gf+9fUJFJc3AQAEAMNSY6Dlc5LkRxJjtOjfasGJM65nJjfvKkW/hDCkcZYBIiKv8EoyeeLECSxcuBAHDhxoVyeKIgRBwJEjR7zxVuRH8o/W4j955VJ50s8HwME5/cgPDUmOhEIhw7HyJogi8LetB7H44RGIDOWa8EREHeWVZHLJkiWorq7G008/jfj4eK6H2wNU1Rvx5sfn/0DIGRyLSaMHYPMXx3wYFdHFyWQCHv9lJv60Ng9NBgtaTDbkbinCU9OHQSHndLtERB3hlWSysLAQL7/8MsaPH++N5sjPtVntyP3woLTudky4Gk/8Kot/RJBfiwxT48lfD8PitXlwiiJKK1uwcedxTB+X6uvQiIgCmlf+JI+OjoZcLvdGU+TnRFHE//2nGJV1RgCuATePTc5AaLDSx5ERXVl6/2hM/Xl/qfz53tP44XCNDyMiIgp8XkkmH3roIaxevRr19fXeaI782Od7T+PHI7VS+YFbByE5PvQyRxD5l9tH9EFOqk4qX/jHERERXTuv3Obeu3cvKioqMHr0aCQmJkKtVrvVC4KAjz76yBtvRT5UcroJG3cel8o/z0rEzdcl+DAiomsnCAIeviMNp/UG1DSaYbE5kPthEZ558HqolV75kUhE1KN45SenVqvFuHHjvNEU+akWoxWvbzkoTUzeLyEU9/NZMwpQwWoFHpucgT+/sxdWuxNV9Sb8Y8cxPHLnEF+HRkQUcLySTC5btswbzZCfcjpF/O2jQ2gyWAG4JoKeOykDQQqOgqXAlRQbgt/cPgh/3+6aleDbg9UY0jcKo9Ljr3AkERFdyCvJ5J49e664z/Dhw73xVuQDW78pw5FTjQBcE5PPvmsIosPVlz+IyM9oNUGQy2WIjAyWtk382QAcP9OKr/adBgCs33EUWWmxSIwJuWQ7TqeI5mZzp8dLRBQovJJMPvDAAxAEAaIoum2/cKoYTloemA6eqMf2705K5Ttv7Iv0/tG+C4jIQzJBgFMEDGab2/Zf3zYIxScbUN1gQpvVgZf/sQ+LHxkJ+UXmnwzVKr0zapGIqBvxSjK5ZcuWdtuMRiP27t2Ld999F6+99po33oY62YVXbACgrsmM/91+GOf+RLguJQYP3jX0outuy+UyzjNJfs9gsuKDz462257aOxy1jSY4RaDsTAuWv70Hqb0j2u03bfwghGi4XCgR0YW8kkwOHjz4ottzcnKgUqnw0ksvYf369d54K+okDqfodsXGbnfiL+v3otXk2hYZqsLsSekwW+wXPT48hMvSUeCKCFFhUJ9I6XGOoxVNiIvU8HNNRHQVOn0ejLS0NKxYsaKz34Y66KdXbA6WNeDEmRYAruckh/SNxI7vT17y+Jn3ZHRyhESdK6VXGKobTGhstUAUgf0ldfhZZuJFr8QTEdF5nfr4j8FgwD/+8Q/odLor70x+o7bRJCWSAJDWNxLRYRxwQ92bIAjIHhgjJY8tJhuOVTT5NigiogDglSuT2dnZ7Z6XE0URbW1tEEURzz//vDfehrpAm9WBfSV1UjkuUoMBiWE+jIio64RogpCWHImDZQ0AgJLTzYiPCkZkKG93ExFdileSyZkzZ1508EVISAhGjx6N/v37X+Qo8jeiKKLgeB2sNicAQBUkR9bAGA6soR6lX0IoqupNqG9pAwDsO6bHmKxEKC4yupuIiLyUTP7+97/3RjPkY2VVrahtPD9/XvbAGKiC5D6MiKjruW53R2Pn/jNwOEUY2+woPtWE9P5Rvg6NiMgveW0Ajl6vx7p167Bnzx4YDAZEREQgJycHv/nNbxAXF+ett6FOUl7disMnG6Ry/8QwxEZqfBgRke8Eq4OQ3i8KhaX1AIATVS1IiAm+wlFERD2TV+7bnDp1CpMmTcLGjRsRHx+PkSNHIjo6Gu+++y4mTZqEU6dOeeNtqJNYbA6s2VSIs8tuI0yrRFpypG+DIvKxPnEhbn9QFZTUwWpz+DAiIiL/5JVk8i9/+Quio6Px+eefY/Xq1Vi6dCnWrFmDzz//HPHx8Xj55Zc7/B6LFi3CAw88IJUrKysxc+ZMZGVlYcKECdi1a5fb/nl5ebjrrruQmZmJBx54oF1Cu379eowePRrZ2dlYuHAhTCZTh2MMVG9tO4RKvQEAIJcJyEmN4XQo1OMJgoDMAdFQyF3fC8Y2Oz78qtTHURER+R+vJJN5eXl4/PHHERnpfjUrKioKjz76KH744YcOtf/9999j06ZNUlkURcydOxcRERHYtGkTJk+ejHnz5qGiogIAUFVVhTlz5uDuu+/G5s2bERMTg7lz58LpdA0s2bFjB1auXInFixfjnXfeQVFREZYvX96hGAPVwRP1+PjbMqk8tF8UQoOVPoyIyH9oVAoM6Xv+WcmPvytDCacLIiJy45VkUqPRQCa7eFMymQx2+8VXTbkaJpMJzzzzDIYNGyZty8vLQ1lZGZ599lmkpKRg9uzZyM7OlhLOjRs3YvDgwZg1axZSUlLwwgsvoKqqCnl5eQCAt99+GzNmzMDYsWORkZGBJUuW4MMPP4TRaPQ4zkBkMNuw7t/n10yPj9IgOS7EhxER+Z/kuBDEhLvmWRVFYPWmQtgdTh9HRUTkP7ySTF5//fXIzc1Fc3Oz2/ampibk5uZixIgRHre9YsUKjBgxwq2NwsJCDBkyBCEh5xOfnJwcFBQUSPXDhw+X6jQaDYYOHYr9+/fD4XCgqKjIrT4rKwsOhwNHjpxPrHqCf3x2DE0GKwDXc5KZAzgNENFPCYKAzJRo6dGP8upWbP/upG+DIiLyI14ZzT1//nz88pe/xC9+8QuMHDkSMTExqKurww8//ACFQuHxM5P79+/HJ598gu3bt2PdunXSdr1ej9jYWLd9o6OjUV1dfdn6mpoatLS0wGKxuNUrFApERERIx3uTUqmAThfq9XY7avf+SvxwuEYqP3L3UFTXd+C50bM5qFbr4eTOHT0+ANq4pjb9/Fz8rY2rarsDcWi1KmQO1GHf0VoAwMffn8K4G/qiX2L4NbcViPzxZ1h3wb7tXOzfruGVK5O9evXCli1bMG3aNNTW1iIvLw91dXWYNm0atm7digEDBlxzm1arFYsWLcLTTz+N8HD3H9hmsxlBQUFu25RKJWw2m1SvVCrb1VutVrS1tUnli9X3BPXNZuRuLpTKY4f3Rs5gTt9EdDmpfSKQ2icCAOBwilj53n7e7iYiQgevTIqiiG3btiEiIgKjR4/GggULAABOpxMzZ85Eamoq4uPjPWp7zZo1SE5OxoQJE9rVqVQqGAwGt21WqxVqtVqq/2liaLVaERERAZVKJZUvdbw3Wa12NDebr7xjFxFFESs+KITB7Eq8o8PUmD0pAza7E0ajpQMNu/7xuI2OHu/HbZy7EnZNbfrpufhbG9fUt16I47d3p2PRG9/BZnfiRGUzNnx8CBNH9fW4PX937qqOXt/q40i6H/Zt52L/XrvwcA2USs/SQo+vTNrtdjzxxBN46qmnpIEt59TX10Ov12PRokV48sknpVHU12Lbtm345ptvkJ2djezsbLz55pvYu3cvsrOzERcXB71e77Z/XV0ddDodAFy2/lxCWVd3fv1pu92OpqamdrfGu6NdBWdw8IRrcnIBwCMT0xCsDrr8QUQEAEiI0eL+8alSees3ZThT17MG7hER/ZTHyeT777+PXbt24ZVXXsH8+fPd6nQ6HT7++GMsX74cn376KTZv3nzN7a9fvx7bt2/Hli1bpFvo6enp2LJlCzIzM1FcXOw2N2R+fj6ysrIAAJmZmdi3b59UZzabcfjwYWRlZUEmkyEjIwP5+flSfUFBAeRyOdLS0q45zkBS39yG9788LpXHD++NwZycnOia3P2z/uiX4LrqYXeIeOvfR+A8N+M/EVEP5HEyuWnTJjzyyCO44447LrnPPffcg/vvvx/vvffeNbffq1cvJCcnS6+wsDCo1WokJydjxIgRSExMxIIFC1BSUoK1a9eisLAQ06ZNAwBMnToVhYWFeP3113H8+HEsWrQIiYmJGDVqFABg+vTpWLduHXbs2IGioiIsXboUU6dOhVar9awzAoAoili/4ygsZ1fwSIgOxtQx/X0cFVHgkctlePiONGl0d+mZFnz6Y7mPoyIi8h2Pk8lTp065Ta9zKT/72c9w8uRJT9/mouRyOXJzc9HQ0IApU6Zg69atWL16NZKSkgAASUlJWLVqFbZu3YqpU6eirq4Oubm50lyYEydOxJw5c7BkyRI8/PDDSE9Pl5737K72HtXjwNl1hgHg4QlpCFLIfRgRUeBK0oXgrpv6SuV/fX0CZVUtvguIiMiHPB6Ao1arr2oJQlEU24289sQf/vAHt3JycjI2bNhwyf3HjBmDMWPGXLJ+9uzZmD17dofjCgTGNhv+8dkxqXxLdi+kJPWMKU2IOssdNyTjQGk9TpxpgcMp4m8fHcLih4ZDo/LKjGtERAHD4yuTaWlp+PLLL6+43xdffIG+fft6+jbkBVu/KUOL0TV6PSJEialjrn2qJiJyp5DLMPvuoVArXVf4axvN+OcFf7QREfUUHieT999/Pz788EN88MEHl9xn06ZN2Lx5M6ZMmeLp21AH1TaasHNfpVSePi4VwWpeOSHyhtgIDR64bZBU/vZgNfIOeX/xAyIif+ZxVjFu3Dj86le/wjPPPIN//OMfGDNmDBITE+F0OlFVVYXdu3ejuLgYt99+O+69915vxkwXCA/XQCa79BKI6/5dDMfZkaZpfaMw7oZkLplI5EWjhsbj4IkGfH82iXzn06Po3yscsREaH0dGRNQ1OnSJavHixcjMzMSbb76Jv/3tb251Q4YMwbJlyzBp0qSOvAVdgUwmQJAJaDW2X73nRGUzvjlwRipPGzsQxjZ7u/3CQ7yyEBJRjzXj1lSUVjajtsmMNqsDaz86hAW/HgaFnN9bRNT9dfh+56RJkzBp0iTo9XpUV1dDLpcjISEBkZGcv7CrtBqt+OCzo27bRFHEdwfPr72dEB2MguIaFBTX/PRwzJyU0ekxEnUHWk0Q5HIZIiOD3bZHAvjjjBwszP0WDqeIE2da8J89FfjNhPZz1zqdol+tikVE1FFee3hOp9NJK9CQ7zUaLKhvca1DLghAGicnJ+owmSDAKUJajvRCCTFa/PIXKXj/8xIAwIdflaJvfBiGDT6/slaoVun5g+pERH6KIzG6qbIz59cjTdKFIETDJROJvMFgan8n4BxRFBEboUFtk+vK46oPCjAmMxHas99/08YP4vciEXU7/CO5GzJb7DhTf3694P5nl34jos4lCAKGpcZAo3JNF2R3iNhzVA+Hw+njyIiIOg+TyW7oZHUrxLNLBUeHqRAeovJtQEQ9iDJIjusHxeLcpAktRiuKTjT4Nigiok7EZLKbcTidOFV9/hZ3v4QwH0ZD1DNFhqqQ3i9KKpfXGlBe03qZI4iIAheTyW6mUm+E1e66paZRyREfHXyFI4ioM/SND0UvnVYqHzjRgFNcv5uIuiEmk91MRa1B+rpvfBhknKCcyCcEQUDmgGiEBrsG3DidIl7bWADjRUaCExEFMiaT3Uib1YH6FotU7h2rvczeRNTZFHIZhg+KhULu+qOuttGMVzcWwHnuoWYiom6AyWQ3UnXBCO7oMBXUSs78RORrIcFByEqJkcp7Dtfgkx/KfRgREZF3MZnsRqrqTdLXiTG8KknkLxJjtOifeH4w3OZdpSg+1ejDiIiIvIfJZDdhsTpQ19wmlROiOPCGyJ8MSY5Eau8IAIAoAm9sPYjGVsvlDyIiCgBMJruJqobzVyWjwlRQq3iLm8ifyGQCHpuWifAQJQCgxWTDG1sPws4JzYkowDGZ7CbO1J1/XjIxmre4ifxRVJgaT04fJk1oXnK6GZt3lfo2KCKiDmIy2Q20mqyov/AWN+eWJPJbGQNiMGV0f6n86Y8V2Ftc68OIiIg6hslkN3D4RAPOTTQSGaKEhre4ifzahBuS3UZ4r/v3EVRf8KgKEVEgYTLZDRwqq5e+1kVofBgJEV0NmSDgkTvToItQA3DNEbvmwyJYrA4fR0ZEdO2YTHYDh8sapK9jmEwSBQStOghzJ2VAIXf9GK7UG7F22yE4nZzQnIgCC5PJAKdvNKPm7O0xuUxAZKjKxxER0dVKjg/FA7emSuX9JXXYuPO4DyMiIrp2TCYD3IHjddLXUWEqyGVci5sokPwsMxG3jegtlXfsqcCX+077MCIiomvDZDLAHSg9n0zGhPMWN1EgmnZLCoal6qTyP3Ycw7dFVT6MiIjo6jGZDGCiKKLogiuTunC1D6MhIk/JBAGz7hqCfgmhAAARrhHeeYerfRsYEdFVYDIZwKrqTdJybEFymbSyBhEFHlWQHH+4NwtJuhAAriUX/77tCK9QEpHfYzIZwI6capS+jg5XQxD4vCRRIAvRBOGP92chMca1ipVTFPHmx0fw0bdlEEWO8iYi/8RkMoCdONMsfX1uvjoiCmxhwUr8v/vOX6EEgC27y/DWv4ths3MeSiLyP0wmA1hcpGvZxGC1AvFRXEKRqLsID1Fh4YxhGNI3Utr2TVEVlm3Yh7pmsw8jIyJqj8lkALvrpr7408wReHbWDVxCkaib0agU+K9pmbgpPV7adrK6Fc/+314cumChAiIiX2MyGcAEQUD2oFjERWt9HQoRdQKFXIaZE9MwfdxAaQ5Zg9mGv75fgO3fnYSTz1ESkR/g5SwiIj8mCALGXd8byfGhyN1yEM0GK0QA//r6BCrqjHji3ixoNUHX3K7TKaKZt8yJyAt4ZZKIKAAMTIrAkoeGY0i/KGnbnsM1ePK13ThysgEGs+2qX4JMgIyrZRGRl/DKJBFRgAgPUWHprBuwbtsh/Of7UwCAmgYTnvnb98hMiXYbAX4508YPQogHVzOJiC6GVyaJiAKIQi7D9NsGI2eQTnqO0uEUse9YHYpO1MPp5HOURNS1mEwSEQWgXjFajM5MQIjm/A2msqpWfHeoGlYb56Mkoq7j18lkeXk5Hn30UQwfPhyjR4/G8uXLYbG4lg+srKzEzJkzkZWVhQkTJmDXrl1ux+bl5eGuu+5CZmYmHnjgAZw6dcqtfv369Rg9ejSys7OxcOFCmEymLjsvIiJvCA1W4mfXJSIh+vw8sw0tFnx7sBptVrsPIyOinsRvk0mr1YpHH30USqUS7733Hl5++WV8/vnnWLFiBURRxNy5cxEREYFNmzZh8uTJmDdvHioqKgAAVVVVmDNnDu6++25s3rwZMTExmDt3LpxOJwBgx44dWLlyJRYvXox33nkHRUVFWL58uS9Pl4jII0EKGa4fpENa8vkJzltNNnxbVA1TGxNKIup8fptMHjhwAOXl5Vi2bBkGDBiAESNG4IknnsC2bduQl5eHsrIyPPvss0hJScHs2bORnZ2NTZs2AQA2btyIwYMHY9asWUhJScELL7yAqqoq5OXlAQDefvttzJgxA2PHjkVGRgaWLFmCDz/8EEaj0ZenTETkEUEQMDApHNkDY3BujLaxzY7vDlbDbGFCSUSdy2+Tyf79+2Pt2rXQas9PyC0IAqxWKwoLCzFkyBCEhJwfuZiTk4OCggIAQGFhIYYPHy7VaTQaDB06FPv374fD4UBRUZFbfVZWFhwOB44cOdL5J0ZE1El6x4bg+sE6nJv1x2RxJZRtTCiJqBP57dRAUVFRuPHGG6Wy0+nEhg0bkJOTA71ej9jYWLf9o6OjUV1dDQCXrK+pqUFLSwssFotbvUKhQEREhHS8NymVCuh0oV5v90KCYIdWq+pwOx1qQ+hgGx09PgDauKY2/fxc/K2Nq2rbD85FEAQoFPIO/0y43Pd8ilaFYI0Suwsq4RRdVyi/P1yLscN7S8uuXmscnf0zrCdj33Yu9m/X8Nsrkz+1bNkyHDlyBE8++STMZjOCgtznSFMqlbDZbAAAs9kMpVLZrt5qtaKtrU0qX6yeiCjQJepCcFNmIoSzyW+ryYqdeys4KIeIOoXfXpk8RxRFPP/883j33Xfx6quvYuDAgVCpVDAYDG77Wa1WqNVqAIBKpWqXGFqtVkREREClUknlSx3vTVarvVOXLIuMDIYoijAaLR1uq0NtiB1so6PH+3Eb564gXVObfnou/tbGNfWtH5yLKIqw2x1obPR89oir/Z6P1CqRk6pD/lE9RADNRiu++LEcN6bHX3Uc567q6PWtHsdLF8e+7Vzs32sXHq6BUulZWujXVyadTieefvppvPfee1ixYgXGjRsHAIiLi4Ner3fbt66uDjqd7or15xLKuro6qc5ut6OpqandrXEiokCWGKNFdmqMVG4x2fD9oRoYzTYfRkVE3Y1fJ5PLly/Htm3bsGrVKtx6663S9szMTBQXF7vNDZmfn4+srCypft++fVKd2WzG4cOHkZWVBZlMhoyMDOTn50v1BQUFkMvlSEtL6/yTIiLqQkm6EGQNPJ9QNhuteGlDPkxtTCiJyDv8NpksKCjA22+/jXnz5iE9PR16vV56jRgxAomJiViwYAFKSkqwdu1aFBYWYtq0aQCAqVOnorCwEK+//jqOHz+ORYsWITExEaNGjQIATJ8+HevWrcOOHTtQVFSEpUuXYurUqW4jx4mIuos+sSHIHBAtlUsrm/HcWz/yGUoi8gq/TSY//fRTAMArr7yCm2++2e0liiJyc3PR0NCAKVOmYOvWrVi9ejWSkpIAAElJSVi1ahW2bt2KqVOnoq6uDrm5uZDJXKc7ceJEzJkzB0uWLMHDDz+M9PR0LFiwwGfnSkTU2ZLjQ5HRP0oqF59sxKsfHICFSy8SUQf57QCcp556Ck899dQl65OTk7Fhw4ZL1o8ZMwZjxoy5ZP3s2bMxe/bsDsVIRBRI+iWEwSmKOFTWCAA4WtGEVZsP4IlfXocghdzH0RFRoPLbK5NEROR9AxLDcd/4VKl8+GQjXtt0gLe8ichjTCaJiHqYiTf1w/TbBknlQycb8eI/96PFyLl2iejaMZkkIuphtJog3Dd+EO6/4ArlyepW/OWf+2B2OBEZGYzIyGCp7lz5p6/wcI0vwiciP8Nkkoioh5EJApwicMdN/fDQxCHSSjlV9Sb8v9e+wTeFZ2Aw29BssKDZYIHBbGv3EmQCZOcWASeiHs1vB+AQEVHnMZis+OCzowCA6wfpkH+sDk6nCJPFjlf+uQ9pyRHITI2FIAgXXW1n2vhBCNEEtdtORD0Pr0wSEfVwCdFa3JwRD43y/IjuI6ea8O2BM7DZnT6MjIgCAZNJIiJCRIgKozMTER2mkrZV1Bjw2Y+nYODyi0R0GUwmiYgIAKBSyjFqaDz6JYRK25oNVuwqOIPymlaIoujD6IjIXzGZJCIiiUwmIKN/NLJSoiE7OzLH4RRRcLwe+Uf1sNq5Yg4RueMAHCKiLqLVBEEul7lNu3Ot5HIZBKHzR1H3iQtFXEwIvi+qkuafPFNvQmOrBcNSdZ3+/kQUOJhMEhF1kXNT8nTkGcTwENWVd/KSqDA1brshGT8erMKpGgMAwGx14NuD1QgPU+M3E9K6LBYi8l9MJomIutCFU/J4YuY9GV6M5soUchkyU2IQG6lBwfF6aXT3v787iX1Ha/HA+FSk9Y3q0piIyL/wmUkiIrqihGgtfp6ViJhwtbStut6El94rwBtbD6Khpc2H0RGRL/HKJBERXRWNSoFRQ+NQXmtASUUzTBY7AODHI7UoKKnDrSN647YRfaBVczJzop6EySQREV01QRCQHBeK3/8qG+/8+wi+3l8JALDandj+3Sl8ua8S9/ysP+68uR+CryKpdDpFNDebOztsIupETCaJiOiaxYRr8N/TczA6qxfW/+cITlW3AgBMbXa8+9kxbPumDBNv6odxI3pDrbz4r5pQrZLPWhF1A0wmiYjIIwaTFQeO1eK6/lGICVOhuLwJxjbXrW+D2Yb3Pz+Gf311HP0SQtEvIQyqILnb8Vzfm6h7YDJJREQdIggCeulCkBCjxWm9EcfKm6TnKW12J45VNKO0sgV94kLQPzGMz1QSdTNMJomIyCtkgoA+sSFIitGivNaA46fPD9JxOEWUVbXiZFUrEmO0SOkV5uNoichbmEwSEZFXyWQC+saHok9cCKrqjDhe2YLms6voiAAq64yorDOiwWDFPaP7o29siLR0IxEFHiaTRETUKWRnb38nxmihb27D8dPNqGs+Px9lUWk9ikrrERupwS+GJeHmjAQEq/lriSjQ8LuWiIg6lSAIiI3QIDZCgyaDBccrm3GmziTV1zaa8d4XJfjw6xMYlR6PsTlJ6BWj9WHERHQtmEwSEVGXiQhR4fpBsTAm26ANVuGLvRUwnl2r3GJz4Kv9lfhqfyUyBkRj4k39cH1aHOSyy98C51yVRL7FZJKIiLqcVh2EmXcNxfTbB+PT70/isx/LcbrWINWfuwUeE6HB+BF9MCa7F7QXmUaIc1US+R6TSSIi8hm73Yn6RhOyU6LRJzYEZVUtqK43QTxbX9dkxrs7jmLj58fQOzYE/RJCERqslI7nXJVEvsdkkoiIfE4QBMSEqxETrobZYsfJ6lacqm6F1e4E4Jpa6GR1K05Wt0IXoUb/hDDERmp8HDURAUwmiYjIz2hUCqQlRyI1KRyn9UacqGpBq8km1eub2qBvaoNWrUBURDBuuyHZh9ESEZNJIiLyS3K5DMln56usb27DiaoWVDecH2hjbLPj7X8fwaadx/GzjAT8IqcXYsJ5tZKoqzGZJCIivyYIAmIiNIiJ0MDYZkNZVSvKa1phd7ierDSabfjkx3J8uqccmQNicFNGAjJToqGQc2gOUVdgMklERAFDqw5Cer8oDO4dgQq9AbVNbahpcM1ZKYpAwfE6FByvQ2hwEEYNjcfNGQlIig3xcdRE3RuTSSIiCjgKhQz9EsKweNYoFJbosfXrEzhwvE6qbzXZsGNPBXbsqUCf+FCMHBqPG4bGo19iGISLLN3IuSqJPMdkkoiIApZCLkNOWjxS+0Sipt6IrwvOYHdBJRpbLdI+5dWtKK9uxQdflCAmXI2cwbHITNUhtXckVEo556ok6iAmk0REFNAMJis++OyoVL45Ix76pjaU17aiut4Ep3h+37rmNnz6Qzk+/aEcggBEhqowOqsXcgbHQheqgjJI7oMzIApsTCaJiKhbEQQBsZEaxEZqYHc4UdtoRlWDCTUNJmnQDuB6xrKhxYItX5/Alq9PQBCAhGgt+sSFoE+saxR579gQt0nSiag9JpNERNRtKeQyJMZokRijhdMpoq7FNWCnrrnNbe5KwJVcnqkz4kydEXmHaqTtYVoleseFoHdsKJJiQ9A7zvVvZKjK7flLPndJPRWTSSIi6hFkMgGxERrERrjmomyzOlDf3IaYyGAcKqtHZa0B4kWOazFacehEAw6daHDbHqxWoJcuBL10WvTvFY4kXQhCVQpEhakuOsiHqLtiMklERD2SWilHL50WM+/JgMFswz//cxgtRiuajVY0G1z/Gsw2OJwXSzEBU5sdJRVNKKlowlf7KqXtKqUcidFa9Dp7RTQxxvU1k0zqrnpsMmm1WvHcc8/hk08+gVKpxEMPPYRZs2b5OiwiIvIRhVyGqDA1osLU0jZRFGGy2GEw2dBqtqHVZIPBZEWr2eb2/OWFLFYHyqpaUFbV4rbdlWQGIzpcg6S4UMRGBkMlA6LC1AgNDkJocBCCFBwARIGnxyaTL774Ivbv34+33noL1dXVmD9/PhITEzFx4kRfh0ZERH5CEARo1UHQqoMQd8F2URTRZnWg1WRDq9mKRF0oKvUGlFe3wmC2XbQtV5LZirKqVuwtrr3oPmqlHCGaINcrOAihmiBoNUEIVikQrFJAo1a0+1pz9sUVf8hXemQyaTKZsHHjRrzxxhtIT09Heno6fvvb32LDhg1MJomI6IoEQZCSuNhIDWbekwEIAloMbWg2WFGpN7hetQZU6o2o1BsumWReqM3qQJvVgbrmtmuOSRUkh1ajQLA6CKogGTSq9omn+9dB0KjkCFa7/lUFyXkbnjzSI5PJ4uJiWK1W5OTkSNtycnKQm5sLu90OhaJHdgsREXWAwWTFps+PuW0LUSswqHc4UpPCYLE5YTTbYLbYYRcBY5sNrQYrzFY7rDYn7A7nJZ/PvBoWmwMWmwMNLZYr73wRMpmAYLUCWnUQgtUKqBRnE1K1AmqlAkEKGYLkMte/ChkUF3wdJJdBcfZfmeBqSxAEyGQCZIIAmQyufwUBgkyQ9pEJAgQBEOD6F4BbQuuqO18QAOCCbef2ddsPAkxtrsTdbLFL7bsf59rvwtz5wv3Ovg2T66vUI7MmvV6P8PBwqFQqaVtMTAxsNhsaGhoQGxvrw+iIiKi7EQQBaqUcaqXrmUit1vX7x2g8n/g9fHc69E1mvPdpMax2B6w2J6w2B2x2J2wOJ2x2V8Jpszths4uwOZywX1DXUU6nCIPJBoPpyldQexrhgi+EiyShFyamwgX7uSem51s6X776hLV3bAhm3TUEESGqK+/cxQRRFD3/MyhAbdmyBa+88gp2794tbauoqMC4cePwxRdfICkpyYfRXTtRFDv016xc5vog+7INf4iBbbCNzm7DH2JgG920DRFwnv117nCKsNmdcDpdVzqdTvHy/4oiHE4nxI7no9TJEmK0iDk7tZU/6ZFXJlUqFaxWq9u2c2WNxv/+k65EEAQo5B2/FO8PbfhDDGyDbXR2G/4QA9vo3m1o/O/iFXVjPXLoV1xcHFpaWtwSSr1eD6VSifDwcB9GRkRERBRYemQymZaWhqCgIOzfv1/alp+fj6FDh3LwDREREdE16JHJpEajwaRJk7B06VIcOHAAX3zxBdatW4ff/OY3vg6NiIiIKKD0yAE4AGA2m7FkyRLs2LEDWq0WM2fOxMyZM30dFhEREVFA6bHJJBERERF1XI+8zU1ERERE3sFkkoiIiIg8xmSSiIiIiDzGZJKIiIiIPMZkkoiIiIg8xmSSiIiIiDzGZJKIiIiIPMZkkoiIiIg8xmSSiIiIiDzGZJKIiIiIPMZkkoiIiIg8xmSSiIiIiDzGZJKIiIiIPMZksocpLy/Ho48+iuHDh2P06NFYvnw5LBYLAKCyshIzZ85EVlYWJkyYgF27dvk42sBTWlqKhx56CNnZ2bjlllvw97//Xapj/3rPokWL8MADD0hl9m3HbNu2DYMGDXJ7zZ07FwD71htsNhuWLVuGkSNHYuTIkVi8eDGsVisA9m9H/Otf/2r3uT33OnPmDPu2CzGZ7EGsViseffRRKJVKvPfee3j55Zfx+eefY8WKFRBFEXPnzkVERAQ2bdqEyZMnY968eaioqPB12AHDZrNh1qxZSEhIwJYtW/CnP/0Jubm5+Oijj9i/XvT9999j06ZNUpl923HHjx/H+PHj8c0330iv5cuXs2+95MUXX8Rnn32G3NxcvP7669i9ezfWrFnD/u2gO+64w+0z+/XXX2Po0KG47bbbkJCQwL7tSiL1GHv27BGHDh0qGgwGadtHH30k3njjjeJ3330nZmRkiK2trVLdgw8+KP71r3/1RagBqaKiQnziiSdEs9ksbXvsscfE//mf/2H/eonRaBTHjh0r3nfffeKMGTNEURTZt17w2GOPia+99lq77ezbjmtubhaHDh0qfvPNN9K2zZs3i4888gj718vWr18vjhw5UmxqamLfdjFemexB+vfvj7Vr10Kr1UrbBEGA1WpFYWEhhgwZgpCQEKkuJycHBQUFPog0MCUlJWHlypVQq9UQRRH5+fnYs2cPRo0axf71khUrVmDEiBEYMWKEtI1923HHjx9Hv3792m1n33Zcfn4+1Go1brzxRmnblClT8Pe//53960UGgwGrV6/GvHnzEB4ezr7tYkwme5CoqCi3H2hOpxMbNmxATk4O9Ho9YmNj3faPjo5GdXV1V4fZLYwePRrTp09HdnY2brvtNvavF+zfvx+ffPIJnnrqKbft7NuOsVqtqKiowM6dO3Hrrbdi3LhxePnll2G1Wtm3XlBeXo5evXph+/btmDhxIm655Rb85S9/Yf962fvvvw+lUolp06YB4M+FrqbwdQDkO8uWLcORI0ewadMmvPXWWwgKCnKrVyqVsNlsPoousOXm5qK2thZLlizBsmXLYDab2b8dYLVasWjRIjz99NMIDw93q2PfdsypU6dgt9sRHByM1157DeXl5Xj++edhNBphsVjYtx1kNBpx+vRpbNiwAUuXLoXRaMTSpUtht9v52fUSURTx/vvvY8aMGVJ/sm+7FpPJHkgURTz//PN499138eqrr2LgwIFQqVQwGAxu+1mtVqjVah9FGdgyMjIAAG1tbXjqqacwdepU9m8HrFmzBsnJyZgwYUK7On52O2bgwIHIy8tDZGQkAGDw4MEQRRFPPvkkpk2bxr7tIIVCAYPBgJdeegl9+vQBAMyfPx/z58/H5MmT2b9ecOjQIZSXl+Oee+6RtvHnQtdiMtnDOJ1OLFq0CNu2bcOKFSswbtw4AEBcXByKi4vd9q2rq4NOp/NFmAGppqYGBw8exNixY6VtAwYMgM1mg06nw7Fjx9z2Z/9evW3btkGv1yM7OxuAa+S8w+FAdnY2fve73/Gz20HnEslzzn1uY2Nj2bcdFBsbC4VCISWSANCvXz9YLBb+XPCSr7/+GpmZmYiLi5O28Xda1+Izkz3M8uXLsW3bNqxatQq33nqrtD0zMxPFxcUwmUzStvz8fGRlZfkgysBUWlqK3//+96ivr5e2HTp0CFFRUcjJyWH/dsD69euxfft2bNmyBVu2bMG0adOQnp6OLVu28LPbQTt27MCNN94ozXsIAIcPH0ZYWBiysrLYtx2UlZUFu92Oo0ePSttKS0uh1WrZv15SWFiI4cOHu23jz4WuxWSyBykoKMDbb7+NefPmIT09HXq9XnqNGDECiYmJWLBgAUpKSrB27VoUFhZKDzPTlQ0fPhwDBgzAggULUFpaip07d+KVV17Bo48+yv7toF69eiE5OVl6hYWFQa1WIzk5mX3bQcOHD4coivjTn/6EsrIyfPXVV3jxxRfxyCOPsG+9oG/fvhg7diwWLlyIgwcPYu/evXj55Zdx7733YtSoUexfLygpKUFKSorbNn52u5hPJyaiLrV8+XIxNTX1oi+bzSaePHlS/PWvfy2mp6eLd9xxh7h7925fhxxwKisrxd/97ndidna2ePPNN4tvvPGG6HQ6RVEU2b9e9Ne//lWaZ1IU2bcddejQIXHGjBliVlaWePPNN4urVq3i59aLWltbxQULFojDhg0TR4wYIb7wwgui1WoVRZH96w0ZGRnizp07221n33YdQRRF0dcJLREREREFJt7mJiIiIiKPMZkkIiIiIo8xmSQiIiIijzGZJCIiIiKPMZkkIiIiIo8xmSQiIiIijzGZJCIiIiKPMZkkIiIiIo8xmSQiIiIijzGZJCIiIiKPMZkkIiIiIo8xmSQiIiIijzGZJCIiIiKPKXwdABERAQaDAStXrsQXX3wBvV6PkJAQjBkzBosWLUJYWBgsFgteeuklfPzxx7BYLJgwYQKio6Oxfft2fPnll1I777zzDjZs2IAzZ84gOTkZjz32GO644w4fnhkRdXdMJomI/MCTTz6JkpISPPnkk9DpdCgsLMSrr76KyMhILFiwAE8//TR27tyJJ598EomJiVi3bh0++ugj6HQ6qY3Vq1fj9ddfx6xZs3D99ddj165d+O///m8IgoAJEyb48OyIqDtjMklE5GMWiwU2mw1LlizB6NGjAQAjR47E/v378eOPP6KsrAzbt2/HsmXLMGXKFADADTfcgLFjx0pttLS0YO3atfjtb3+L//qv/wIA3HzzzTAajXjllVeYTBJRp2EySUTkYyqVCuvWrQMAnD59GidPnkRJSQlKS0uhUqmwZ88eAMC4ceOkYzQaDcaMGYMffvgBAFBQUACLxYKf//znsNvt0n6jR4/G5s2bUVFRgd69e3fhWRFRT8FkkojID3zxxRdYtmwZKioqEBkZifT0dKjVajidTjQ2NiIoKAhhYWFux8TExEhfNzU1AQDuu+++i7av1+uZTBJRp2AySUTkYydPnsQTTzyByZMnY8OGDYiPjwcAPPHEEygtLUVsbCxsNhtaWlrcEsqGhgbp69DQUADAmjVrEBcX1+49+vXr18lnQUQ9FacGIiLyscOHD8Nms2H27NlSImkymZCfnw9RFDFs2DDIZDK3UdtWqxW7d++WypmZmQgKCkJ9fT0yMjKkV0lJCdasWdPl50REPQevTBIR+VhaWhrkcjleeukl3H///WhsbMS6detQV1cHpVKJ5ORk3HXXXfjzn/8Mk8mEXr164Z133oFer0diYiIAICoqCg888ACWL1+O5uZmXHfddSguLsaKFSswduxYhISE+Pgsiai7EkRRFH0dBBFRT7dt2zasXr0aZ86cgU6nw+jRozFo0CA8++yz+OqrrxAaGorly5fjk08+gd1ux5133onW1lYcP34c27ZtAwA4nU68+eab2LhxI6qqqhAbG4s777wTjz/+OJRKpY/PkIi6KyaTRER+rqGhAd9++y1uueUWtyuM9913H2JiYrB69WofRkdEPR1vcxMR+Tm1Wo2lS5fik08+wX333QeFQoH//Oc/KCgowFtvveXr8Iioh+OVSSKiAHDgwAGsWLECBw8ehM1mw6BBgzBnzhz8/Oc/93VoRNTDMZkkIiIiIo9xaiAiIiIi8hiTSSIiIiLyGJNJIiIiIvIYk0kiIiIi8hiTSSIiIiLyGJNJIiIiIvIYk0kiIiIi8hiTSSIiIiLyGJNJIiIiIvIYk0kiIiIi8hiTSSIiIiLyGJNJIiIiIvIYk0kiIiIi8tj/B1LT7uoh77UEAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 720x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "fig, ax = plt.subplots(figsize=(5, 2))\n",
    "sns.histplot(data=df_train, x='age', kde=True, bins=30, ax=ax);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "6fd99dea",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeMAAAEYCAYAAABvIp7iAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAABYlAAAWJQFJUiTwAAAeyklEQVR4nO3deVxVdf7H8fdVBExcGzCX3MpMQQEtDDUqNMMtM0eHMh42Zi7kaIu5ZP5SrDB3cyt/plFqajFpNuWkOZaZlpkylZLoT3PJTA0XFkW45/cHwxmvgoCAXy68no8HD7nnfM/3fj4X7nlzzj3X67AsyxIAADCmgukCAAAo7whjAAAMI4wBADCMMAYAwDDCGAAAwwhjAAAMI4wBADCMMAYAwDDCGAAAwwhjAAAMI4wBADCMMAYAwDDCGAAAwwhjAAAM8yjKxk6npczMrOKqxQhPz+yHICMj03AlJYP+3Bv9uTf6c2+X9+fhUVEVKjhK5L6KFMaZmVk6cya9uGoxwte3qiS5fR95oT/3Rn/ujf7c2+X9Va9e2Q7o4sZpagAADCOMAQAwjDAGAMAwwhgAAMMIYwAADCOMAQAwjDAGAMAwwhgAAMMIYwAADCOMAQAwjDAGAMAwwhgAAMMIYwAADCOMAQAwjDAGAMCwkvlgxjJgwIBHTZfgthYvXm66BABwKxwZAwBgGGEMAIBhnKYugKrNI43d97k9K0pFHfm5tE4AQOFwZAwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYVmrCeN26f+iZZ6K1bt0/TJcClDs8/wCzSk0Yr1kTrzNnTmvNmnjTpQDlDs8/wKxSE8YXLpx3+RfA9cPzDzCr1IQxAADlFWEMAIBhhDEAAIYRxgAAGEYYAwBgGGEMAIBhhDEAAIYRxgAAGEYYAwBgGGEMAIBhhDEAAIYRxgAAGEYYAwBgGGEMAIBhhDEAAIYRxgAAGEYYAwBgGGEMAIBhhDEAAIYRxgAAGEYYAwBgGGEMAIBhhDEAAIYRxgAAGEYYAwBgGGEMAIBhhDEAAIYRxgAAGEYYAwBgGGEMAIBhHqYLAFC6DBjw6DVvW6VKVaWmnpMkeXvfoPPn0yRJ9es30JEjhyRJzZo1188/73HZzs/vJv3++2+SpFatWuvf//5ektSiRSvt3v1vSVJ4eGdt3PiZy3adO3fVZ599IkmKiOiudes+vmL5X/4SpfffXyan02lvN2jQMP3v/86TJI0f/4p++GGnJCk09G5NmDBWkjRhQqy2bt2s8+fPa9Omz+VwOPTii5P07rtvSZL++tfBmjr15f/M8bJ8fHxcasvMzNTKlSslSe3ahWv9+k8lSV269JCHR9673szMTH366dorxqakpGjSpBclSc88M0YzZ06WJD3//ItasuRNSdLf/vacvL29CzRfUV3aX1hY52Kbt7iUVN8lxWFZlnWtG2dkZOrMmfRiKeTSHcDixcuLZc6C8PWtKkk6ceJcnvVUbR553eq53Lk9K0pFHfm5tM7S8PMrK65Xf0UJYHfncDiUsxv09vbW+fPn//N9ZZ0/77p/8/CopMzMi1eMrV//ZsXEvOYy9rPPPtWKFe9KkoKD79DOnd9Jkh55JEr3398lz3ou3e7SsePHj9bRo4f/U4eHMjMzr6izdes7NGzYswWar6hKat7iUhz1Xf78q169sjw9SybUOU0NoFy79HgkJ1yzv7/yQCMniC8fe+TIYe3bl2TfPnfurD76KN6+nRPEkrRmTbzOnTubay2Xb5czdt++JDuIs+vIzLXO77//TseO/ZrvfEVVUvMWl9JeX24IYwAoBrNnT7W/X706XmlpabmOS0tL05o18bmuu3y7nLGzZ08pcB3z58/Od76iKql5i0tpry83hDFQzpXnU9TFKTU1RevXr9ORI4e1adOGq47dtOlzHT16xGVZXtv9618blJqaWuA6jh49rO+//y7P+XK778IoqXmLS2mvLy+EMVCOffnll6ZLKFNWrVqm5cvjlN+lOE6nUytWvGuPsyxLK1cuzXW7a7msZ8mShXrvvXdy3fby+y6Mq9VZlHmLS2mv72oIY6Ace/vtN0yXUKZkZWUpMXF3gcb+9NMP+ve/d0mSEhJ26qeffii2OlJTU7Rnz08Fuu/CyK/Oa523uJT2+q6mVF7rzWkz98bPDyiYFSuWqlmz5lq5cpmR+/b3b1ngt/xkv5Up/zoLO29xKe315YcjYwAADCOMAcCQyMjH5O3trb/8pZ+R+y7M0aGHh0eB6izsvMWltNeXn9JXkUrHfxrBqdZrVxp+fmVFSff35Zdf8rpxMapYsaKaNm1WoNeN/f1bqlWrIElSYGCw/P1bFtvrxlWq+KhBg4Z5vm586X0XRn51Xuu8xaW013c1HBkD5VhYWJjpEsqUvn37qV+/x+VwOK46rkKFCoqMjLLHORwOl9uXym+u3Pz1r4P06KP9c9328vsujKvVWZR5i0tpr+9qCGOgnLueZzLKsipVfHT//RGqV6++7r2301XH3ntvR9WrV99lWV7b3XdfJ1WpUqXAddSrd7Nat74jz/lyu+/CKKl5i0tpry8vhDEAFIMRI563v3/ood664YYbch13ww03qGfP3rmuu3y7nLEjRowqcB3R0SPyna+oSmre4lLa68sNYQygXLv0tOWln3jk7V35irEeHpVyHVu//s269dam9u2qVavpwQf/u/MPDr7D/r5nz96qWrVarrVcvl3O2Ftvbap69W6+pI7/Xu5zaZ2tW9+hOnXq5jtfUZXUvMWltNeXm1J5ARcA91S9eg2dOXNaUvYOMec/52/UqLEOHjwgSWrePEB79vzosl3duvX0669HJbl+wlFgYLASErI/3rBLlx72R+Ll6Nq1hz75JHtZjx4Pae3a1ZKkiIgeWrcue3lU1F+1bFmcy0coDhkyQm+++bokKSZmsr7/Pvv+2rW7Wy+9NNZevmXLly4fofg///Oy4uIWSXL9CMXRo8df8ViEh98vDw/rP/P+9yMU77vv/qs+huHh9+vChfNXjB09erz9EYrPPTdG06dPtpe/9Vb2RXgDB0YXeL6iurS/sLDim7e4lFTfJYWPUOQjFIsFH6FYMkx8hCI/v+JDf+6Nj1AEAKAcIYwBADCMMAYAwDDCGAAAwwhjAAAMI4wBADCMMAYAwDDCGAAAwwhjAAAMI4wBADCMMAYAwDDCGAAAwwhjAAAMI4wBADCMMAYAwDDCGAAAwwhjAAAMI4wBADCMMAYAwDDCGAAAwwhjAAAMI4wBADCMMAYAwDDCGAAAwwhjAAAMI4wBADCMMAYAwDDCGAAAwwhjAAAMI4wBADCs1ISxl5e3y78Arh+ef4BZpSaMe/bsrerVa6hnz96mSwHKHZ5/gFkepgvIERHRTRER3UyXAZRLPP8As0rNkTEAAOUVYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhnmYLsAdnNuzwnQJkkpPHQCA4sWRMQAAhhHGAAAYxmnqPCxevNx0CcXC17eqJOnEiXOGKwEA5IUjYwAADCOMAQAwjDAGAMAwwhgAAMMIYwAADCOMAQAwjDAGAMAwwhgAAMMIYwAADCOMAQAwjDAGAMAwwhgAAMMIYwAADCOMAQAwjDAGAMAwh2VZ1rVu7HRayszMKs56rjtPz+yPdM7IyDRcScmgP/dGf+6N/tzb5f15eFRUhQqOErmvIoUxAAAoOk5TAwBgGGEMAIBhhDEAAIYRxgAAGEYYAwBgGGEMAIBhhDEAAIYRxgAAGEYYAwBgGGEMAIBhhDEAAIYRxgAAGEYYAwBgGGEMAIBhhDEAAIYRxgAAGEYYAwBgGGEMAIBhhDEAAIaVqzA+dOiQhgwZojvvvFNhYWGaPHmyLly4IEk6evSoBgwYoKCgIHXp0kVffPGF4WqLZty4cYqKirJvl4X+Ll68qNjYWLVt21Zt27bVSy+9pIyMDEllo78zZ85o5MiRCgkJ0d13361p06YpKytLknv3l5GRoe7du+vrr7+2l+XXz7Zt29SjRw8FBgYqKipKv/zyy/Uuu8By62/r1q3q3bu3goOD9cADD+j999932cbd+7t0Xbdu3TRnzhyX5e7e3/HjxxUdHa2goCDde++9WrZsmcs2JdFfuQnjjIwMDRkyRJ6enlqxYoWmTZumDRs2aObMmbIsS9HR0apRo4Y++OAD9erVS8OHD9fhw4dNl31Ntm7dqg8++MC+XVb6mzJlitavX6/58+drwYIF2rx5s+bNm1dm+ps4caKOHz+upUuXaurUqVq9erWWLFni1v1duHBBzz77rJKSkuxl+fVz7NgxDR06VA8++KDi4+P1pz/9SdHR0XI6nabayFNu/R08eFCDBw/W/fffr9WrV+upp55STEyMNm7cKMn9+7vUggULtG/fPpdl7t6f0+nU0KFDdeHCBcXHx2vkyJGKjY3Vli1bJJVgf1Y5sX37dsvf399KSUmxl3300UdWu3btrK+//tpq2bKlde7cOXtd//79rRkzZpgotUhSU1Otjh07WpGRkdZjjz1mWZZVJvo7c+aM5e/vb3311Vf2svj4eOuJJ54oE/1ZlmW1bt3aWr9+vX07NjbWrftLSkqyHnzwQatHjx7WbbfdZm3ZssWyrPx/H2fNmmVFRkba69LS0qzg4GB7+9Iir/7mzZtn9e3b12Xsiy++aD399NOWZbl/fzn27NljtW/f3oqIiLBef/11e7m797dp0yYrODjYSk5OtseOHz/emjNnjmVZJddfuTkybtKkiRYuXKgqVarYyxwOhzIyMpSQkKAWLVrIx8fHXtemTRvt2rXLQKVFM3PmTIWEhCgkJMReVhb627Fjh7y9vdWuXTt72cMPP6xFixaVif4kqUaNGvroo4+Unp6u48ePa/PmzfL393fb/r777ju1b99eK1eudFmeXz8JCQm688477XWVK1eWv7+/du7ceV3qLqi8+uvSpYvGjx/vsszhcNgvibl7f5KUlZWlF154QSNHjlSNGjVc1rl7f9u2bVPbtm1d+oqJidGwYcMklVx/HkXa2o3UqlXLZUfudDq1dOlStWnTRidOnJCfn5/L+BtvvFG//fbb9S6zSHbu3Kl169bp448/1uLFi+3lZaG/Q4cOqV69evr444/1xhtvKC0tTREREXrmmWfKRH+S9NJLL2nUqFFq3bq1nE6n7rrrLv3tb39TbGysW/YXGRmZ6/L8fl55rT9+/HjJFHqN8uqvcePGLrdPnjypf/zjH/bO3N37k6S33npLNWvW1EMPPXRFmLl7f4cOHVLdunU1c+ZMrV69Wj4+Pnr88cfVp08fSSXXX7kJ48vFxsZqz549+uCDD7RkyRJVqlTJZb2np6cuXrxoqLrCy8jI0Lhx4/TCCy+oevXqLuvS09Pdvr/U1FQdOXJES5cu1cSJE5WamqqJEycqMzOzTPQnZe8EWrRooaeeekopKSmaNGmSXnvttTLTX478+klPT5enp+cV63Mu1nMnaWlpGjZsmPz8/Oydv7v3d+DAAb311luKj4/Pdb2795eamqo1a9aoc+fOmjdvnnbv3q2YmBjVrFlTnTp1KrH+yl0YW5alV155Re+9955mz56tpk2bysvLSykpKS7jMjIy5O3tbajKwps3b54aNmyoLl26XLGuLPTn4eGhlJQUTZ06VQ0aNJAkjRo1SqNGjVKvXr3cvr9Dhw7p1Vdf1caNG3XTTTdJyv65DRgwQH369HH7/i6V3++jl5fXFTu2jIyMK06Hlnbnzp3T4MGDdeTIES1fvlyVK1eW5N79WZalcePGaejQoapfv36uY9y5P0mqWLGiqlWrpkmTJqlixYoKCAhQYmKi3nvvPXXq1KnE+itXYex0OjVu3DitXbtWM2fOVKdOnSRJtWvXVmJiosvYkydPytfX10SZ12Tt2rU6ceKEgoODJWW/DSgrK0vBwcEaPHiw2/fn5+cnDw8PO4il7NOBFy5ckK+vr/bu3esy3t36+/HHH1WlShU7iCUpICBAWVlZZaK/S+X3fKtdu7ZOnDhxxfqmTZtetxqL6o8//tATTzyhkydP6p133nH5vXXn/n799Vft2LFDu3fv1uzZsyVJ58+f1w8//KCEhAQtWrTIrfuTsvc1TqdTFStWtJc1btxYW7dulVRyP79ycwGXJE2ePFlr167VnDlz1LlzZ3t5YGCgEhMTlZaWZi/bsWOHgoKCDFR5bd599119/PHHWr16tVavXq0+ffooICBAq1evLhP9BQUFKTMzUz///LO9bP/+/apSpYqCgoLcvj8/Pz+dPXtWx44ds5ft379fUvbFh+7e36Xy+30MDAzU999/b69LT0/X7t273abfnLdRJicna9myZWrSpInLenfur3bt2vrss8+0Zs0ae1/TvHlzRUZG6pVXXpHk3v1JUnBwsPbu3evyMtC+fftUr149SSXXX7kJ4127dikuLk7Dhw9XQECATpw4YX+FhISobt26GjNmjJKSkrRw4UIlJCTYL9i7g3r16qlhw4b2V7Vq1eTt7a2GDRuWif4aNWqkjh07auzYsfrxxx/13Xffadq0aerbt69CQ0Pdvr+goCA1b95cY8eOVWJionbt2qXx48erZ8+eeuCBB9y+v0vl9/vYu3dvJSQk2O9hHTdunOrWravQ0FDDlRfM22+/rZ9++kmxsbGqXLmyvZ85ffq0JPfuz8PDw2U/07BhQ3l5eal69eqqXbu2JPfuT5K6du0qDw8Pvfjiizpw4IDWrFmjv//973r00UcllWB/RXpjlBuZPHmyddttt+X6dfHiRevgwYNWv379rICAAKtr167W5s2bTZdcJDNmzLDfZ2xZVpno79y5c9aYMWOs1q1bWyEhIdarr75qZWRkWJZVNvr77bffrOHDh1shISFW+/btrUmTJlnp6emWZbl/f5e/TzW/fjZt2mQ98MADVqtWrayoqCjrl19+ud4lF8ql/fXq1SvX/cyl70115/4uFxkZ6fI+Y8ty//72799v9e/f3woICLDuu+8+a9WqVS7jS6I/h2VZVjH8MQEAAK5RuTlNDQBAaUUYAwBgGGEMAIBhhDEAAIYRxgAAGEYYAwBgGGEMAIBhhDEAAIYRxgAAGEYYAwBgGGEMFLPw8HDFxMSYLuO6OXLkiJo1a6Z169aZLgVwW4QxAACGEcYAABhGGKNMatasmVasWKGhQ4cqMDBQ4eHhWrp0qb0+r1OrPXv21JgxYyRJ33zzjT1Phw4ddM899+jIkSOSpJUrV6pbt25q1aqVIiIitGrVKpd5zp8/rwkTJigkJERt2rTR6NGjlZKSYq9PSUnRyy+/rPvuu08BAQG66667NHr0aJ09e9Yek5CQoH79+ik4OFghISEaPny4jh496nI/77zzjjp37qyAgAB169ZNn3zySaEep6ioKD3xxBMuy5xOp9q3b6/Zs2dLkn7//XeNHTtWHTp0kL+/vzp06KBXXnlFGRkZuc45Z84cBQcHuyzbs2ePmjVrpm+++cZe9uOPP6p///4KDAzUXXfdpUmTJik9Pb1Q9QNlhYfpAoCSMm3aNN1zzz2aM2eOtmzZokmTJsnT01N9+/Yt1Dzz589XTEyMzp49q/r162vJkiV67bXX9PjjjyssLEzffvutxo8frxtuuEHdu3eXJH344YeKiIjQrFmztHfvXk2ZMkU1a9a0g/65555TUlKSnnvuOfn6+iohIUGzZ8+2x6Snp2vQoEFq3769hg0bprNnz2rq1Kl69tlntXLlSknS3LlztWDBAj355JO644479MUXX+jZZ5+Vw+FQly5dCtRb9+7dFRMTo+TkZNWsWVNS9h8hJ0+eVPfu3eV0OjVw4EA5HA699NJL8vHx0VdffaVFixapQYMGioqKKtRjmWPfvn167LHHFBQUpFmzZunUqVOaPn26jhw5ojfffPOa5gTcGWGMMqtJkyaaPn26JCksLEzHjh3TG2+8Uegw7t+/v8LDwyVlHzW+8cYbevjhh+1gbdeunQ4fPqwdO3bYYdy4cWPNmDFDDodD7dq107Zt2+yjwgsXLujixYuaMGGCwsLCJElt27bVzp079e2330qSkpKSdPr0aUVFRdlHmTVr1tS2bdvkdDqVkpKihQsXauDAgXr66aclSR06dFBqaqqmT59e4DCOiIjQpEmTtGHDBvXp00eS9Omnn+r222/XLbfcomPHjql69eoaN26cbr/9dklSaGioNm/erO3bt19zGM+fP1833nijFi5cKE9PT0lSo0aN1K9fP23fvl133nnnNc0LuCvCGGVW165dXW537NhR//znP/Xbb78Vap5bb73V/v7AgQM6ffq0Hc45ckI/R2BgoBwOh327fv36SkpKkiR5eXlp8eLFkrJPlx88eFBJSUnav3+/vLy8JGX/IVGjRg0NGTJE3bp10z333KPQ0FCFhIRIknbt2qULFy7o3nvvVWZmpn0/YWFhio+P1+HDh3XzzTfn21v16tXVoUMHrVu3Tn369FFWVpbWr1+vAQMGSJLq1Kmjd999V06nUwcPHtTBgweVmJioU6dOqW7dugV+DC/3zTffqGPHjqpQoYJdf1BQkHx8fLR161bCGOUOYYwyy8/Pz+V2rVq1JEmnT5+Wj49PgefJ2S5n28uX5aZy5coutx0OhyzLsm9//vnnio2N1eHDh1WzZk0FBATI29tbTqdTkuTj46OlS5dq3rx5+vDDD7Vs2TJVq1ZNzzzzjB599FG7jsjIyFzv/8SJEwUKY0nq0aOHRo0apdOnT2v37t1KTk5Wt27d7PXvv/++Zs2apZMnT8rX11eBgYHy8vJy6aewTp8+rZUrV9qn3C+vHShvCGOUWcnJyS63T506JSk7SC9evChJdvjlSEtLu+qcVatWlST98ccfLssPHDig5ORktW7dOt+6Dh48qBEjRqhXr15aunSpbrrpJknSiBEjtH//fntc06ZNNWvWLGVkZGjHjh2Ki4vTxIkT5e/vb9cxb9481a5d+4r7aNy4cb515AgPD5enp6c2btyoXbt2KTg42D7qzXk9PDo6Wo899pj9R8if//znPOdzOBxXPK6pqakut318fNSxY0c98sgjV2yf89o1UJ5wNTXKrE2bNrnc/vzzz9WkSRP5+fnZR8a///67vf748eP21dJ5yTl9fPncs2fP1pQpUwpU1+7du3Xx4kUNGjTIDuK0tDTt2LHDPtr88ssvFRoaqj/++EOenp4KDQ3V+PHjJUm//vqrAgMDValSJZ06dUotW7a0v5KSkjRv3rwC1ZGjcuXKCg8P17/+9S9t2LDBft1byj4d7nA4NHToUDuIjx8/rr179+Z5ZOzj46Pz58+7XBm+Y8cOlzFt2rTR//3f/ykgIMCuvU6dOpo+fbp9Oh8oTzgyRpm1efNmxcTEKDw8XJs2bdL69es1a9YsSdmvlQYGBmrx4sWqU6eOKlasqLlz56patWpXndPDw0ODBw/W1KlTVbNmTYWGhmr79u1at26d5s6dW6C6mjdvrooVK2rq1Kl65JFHlJycrMWLF+vkyZP2xUytWrWSZVkaNmyYnnzySVWqVElxcXGqVq2a2rZtq1q1aikqKkqTJ0/WmTNn1KpVKyUmJmrmzJnq2LFjoU7DS9mnqqOjo6+4Ertly5ZyOp169dVXFRERoWPHjmnBggXKyMjI821Id999t2JjYzVu3Dj169dPiYmJWr58ucuY6OhoRUZGasSIEerdu7cyMjI0f/58HTt2TC1atChU7UBZQBijzBo4cKD27Nmj6OhoNWjQQDNnzlRERIS9PjY2VhMmTNDIkSPl6+urQYMG6euvv8533gEDBsjLy0txcXF6++231ahRI82YMUOdOnUqUF2NGzfWa6+9prlz52rQoEHy9fVVWFiYevfurZiYGB0/fly1a9fWokWLNH36dI0aNUoXL15Uq1attGTJEvsI9fnnn1etWrW0atUqvf766/Lz81P//v01bNiwQj9WHTp0ULVq1eTv7+/yenhoaKjGjh2ruLg4xcfH66abblKXLl3k4eGhuLi4XN9rfMstt+jll1+233YVGBio119/3eUq9oCAAMXFxWnWrFkaPny4vLy81Lp1a02ZMiXX0+5AWeewinIVBlBKNWvWTKNGjbriP7QAgNKII2OgDLIsS1lZWfmO8/BgFwCUBjwTgTLoww8/1NixY/Md9/PPP1+HagDkh9PUQBmUnJyc75XhUvYFWgDMI4wBADCM9xkDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAYRhgDAGAYYQwAgGGEMQAAhhHGAAAY9v/h38sQjUXOvwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 576x259.2 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "fig, ax = plt.subplots(figsize=(4, 1.8))\n",
    "sns.boxplot(data=df_train, x='purchase_value');"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5311267",
   "metadata": {},
   "source": [
    "### source"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "89a79070",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "SEO       0.403167\n",
       "Ads       0.395021\n",
       "Direct    0.201813\n",
       "Name: source, dtype: float64"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train['source'].value_counts(normalize=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2759b91",
   "metadata": {},
   "source": [
    "### browser"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "df474cf7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Chrome     38981\n",
       "IE         23260\n",
       "FireFox    15690\n",
       "Safari     15687\n",
       "Opera       2382\n",
       "Name: browser, dtype: int64"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train['browser'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5577df5a",
   "metadata": {},
   "source": [
    "### sex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "62778bdd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "M    56087\n",
       "F    39913\n",
       "Name: sex, dtype: int64"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train['sex'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e278e13",
   "metadata": {},
   "source": [
    "### country"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "e7853f70",
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "United States     36977\n",
       "Unknown           13882\n",
       "China              7595\n",
       "Japan              4622\n",
       "United Kingdom     2869\n",
       "Name: country, dtype: int64"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = df_train['country'].value_counts()\n",
    "s.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c12ba255",
   "metadata": {},
   "source": [
    "Find percent of countries that appear less than 50 times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "0df1e79b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "61.0"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "round((s < 50).mean(), 3) * 100"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bcdb2174",
   "metadata": {},
   "source": [
    "## class\n",
    "\n",
    "Get frequency"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "809eba76",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    85845\n",
       "1    10155\n",
       "Name: class, dtype: int64"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train['class'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6489c345",
   "metadata": {},
   "source": [
    "10.6% are fraud"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "81dbf202",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    89.4\n",
       "1    10.6\n",
       "Name: class, dtype: float64"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train['class'].value_counts(normalize=True).round(3) * 100"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69d59dc3",
   "metadata": {},
   "source": [
    "## Univariate exploration\n",
    "\n",
    "Explore relationship to target with these simple columns. Let's see if we discover any values that are sufficiently different than the 10.6% fraud.\n",
    "\n",
    "There is a small signal here with source. Direct is 1% more fraud."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "92d31eff",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>source</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Ads</th>\n",
       "      <td>37922</td>\n",
       "      <td>0.103845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Direct</th>\n",
       "      <td>19374</td>\n",
       "      <td>0.118613</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SEO</th>\n",
       "      <td>38704</td>\n",
       "      <td>0.101256</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        count  perc_fraud\n",
       "source                   \n",
       "Ads     37922    0.103845\n",
       "Direct  19374    0.118613\n",
       "SEO     38704    0.101256"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.groupby('source').agg(count=('class', 'size'), perc_fraud=('class', 'mean'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9299c9c8",
   "metadata": {},
   "source": [
    "Browser does not appear to have much signal. IE has slightly lower fraud."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "078bcb97",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>browser</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Chrome</th>\n",
       "      <td>38981</td>\n",
       "      <td>0.112516</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FireFox</th>\n",
       "      <td>15690</td>\n",
       "      <td>0.106883</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>IE</th>\n",
       "      <td>23260</td>\n",
       "      <td>0.096303</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Opera</th>\n",
       "      <td>2382</td>\n",
       "      <td>0.102015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Safari</th>\n",
       "      <td>15687</td>\n",
       "      <td>0.102569</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         count  perc_fraud\n",
       "browser                   \n",
       "Chrome   38981    0.112516\n",
       "FireFox  15690    0.106883\n",
       "IE       23260    0.096303\n",
       "Opera     2382    0.102015\n",
       "Safari   15687    0.102569"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.groupby('browser').agg(count=('class', 'size'), perc_fraud=('class', 'mean'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8d80866",
   "metadata": {},
   "source": [
    "Slightly more fraud by males."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "ae6f6c99",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>39913</td>\n",
       "      <td>0.102799</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>56087</td>\n",
       "      <td>0.107904</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     count  perc_fraud\n",
       "sex                   \n",
       "F    39913    0.102799\n",
       "M    56087    0.107904"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.groupby('sex').agg(count=('class', 'size'), perc_fraud=('class', 'mean'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d077d673",
   "metadata": {},
   "source": [
    "Look at the highest and lowest fraud by countries with more than 50 transactions. There appears to be much more signal here (though smaller counts)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "ba5bbc4c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Peru</th>\n",
       "      <td>74</td>\n",
       "      <td>0.351351</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ecuador</th>\n",
       "      <td>68</td>\n",
       "      <td>0.323529</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tunisia</th>\n",
       "      <td>78</td>\n",
       "      <td>0.294872</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Kuwait</th>\n",
       "      <td>60</td>\n",
       "      <td>0.266667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ireland</th>\n",
       "      <td>169</td>\n",
       "      <td>0.266272</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lithuania</th>\n",
       "      <td>62</td>\n",
       "      <td>0.258065</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Saudi Arabia</th>\n",
       "      <td>181</td>\n",
       "      <td>0.243094</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New Zealand</th>\n",
       "      <td>186</td>\n",
       "      <td>0.225806</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Denmark</th>\n",
       "      <td>309</td>\n",
       "      <td>0.184466</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Venezuela</th>\n",
       "      <td>152</td>\n",
       "      <td>0.184211</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Chile</th>\n",
       "      <td>268</td>\n",
       "      <td>0.175373</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Belgium</th>\n",
       "      <td>252</td>\n",
       "      <td>0.162698</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United Arab Emirates</th>\n",
       "      <td>74</td>\n",
       "      <td>0.162162</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Greece</th>\n",
       "      <td>145</td>\n",
       "      <td>0.151724</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ukraine</th>\n",
       "      <td>285</td>\n",
       "      <td>0.150877</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                      count  perc_fraud\n",
       "country                                \n",
       "Peru                     74    0.351351\n",
       "Ecuador                  68    0.323529\n",
       "Tunisia                  78    0.294872\n",
       "Kuwait                   60    0.266667\n",
       "Ireland                 169    0.266272\n",
       "Lithuania                62    0.258065\n",
       "Saudi Arabia            181    0.243094\n",
       "New Zealand             186    0.225806\n",
       "Denmark                 309    0.184466\n",
       "Venezuela               152    0.184211\n",
       "Chile                   268    0.175373\n",
       "Belgium                 252    0.162698\n",
       "United Arab Emirates     74    0.162162\n",
       "Greece                  145    0.151724\n",
       "Ukraine                 285    0.150877"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_country = df_train.groupby('country').agg(count=('class', 'size'), perc_fraud=('class', 'mean'))\n",
    "df_country.query('count > 50').nlargest(15, 'perc_fraud')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "008957b1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Slovenia</th>\n",
       "      <td>57</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bulgaria</th>\n",
       "      <td>102</td>\n",
       "      <td>0.009804</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Morocco</th>\n",
       "      <td>106</td>\n",
       "      <td>0.028302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pakistan</th>\n",
       "      <td>117</td>\n",
       "      <td>0.034188</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Israel</th>\n",
       "      <td>169</td>\n",
       "      <td>0.035503</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Kazakhstan</th>\n",
       "      <td>52</td>\n",
       "      <td>0.038462</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Portugal</th>\n",
       "      <td>140</td>\n",
       "      <td>0.042857</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Slovakia (SLOVAK Republic)</th>\n",
       "      <td>57</td>\n",
       "      <td>0.052632</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>European Union</th>\n",
       "      <td>160</td>\n",
       "      <td>0.056250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Romania</th>\n",
       "      <td>333</td>\n",
       "      <td>0.057057</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Poland</th>\n",
       "      <td>454</td>\n",
       "      <td>0.057269</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Malaysia</th>\n",
       "      <td>136</td>\n",
       "      <td>0.058824</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Philippines</th>\n",
       "      <td>95</td>\n",
       "      <td>0.063158</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thailand</th>\n",
       "      <td>193</td>\n",
       "      <td>0.072539</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Viet Nam</th>\n",
       "      <td>338</td>\n",
       "      <td>0.073964</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                            count  perc_fraud\n",
       "country                                      \n",
       "Slovenia                       57    0.000000\n",
       "Bulgaria                      102    0.009804\n",
       "Morocco                       106    0.028302\n",
       "Pakistan                      117    0.034188\n",
       "Israel                        169    0.035503\n",
       "Kazakhstan                     52    0.038462\n",
       "Portugal                      140    0.042857\n",
       "Slovakia (SLOVAK Republic)     57    0.052632\n",
       "European Union                160    0.056250\n",
       "Romania                       333    0.057057\n",
       "Poland                        454    0.057269\n",
       "Malaysia                      136    0.058824\n",
       "Philippines                    95    0.063158\n",
       "Thailand                      193    0.072539\n",
       "Viet Nam                      338    0.073964"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_country.query('count > 50').nsmallest(15, 'perc_fraud')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "492c0cde",
   "metadata": {},
   "source": [
    "Looking at the top 15 larges countries, most are around the average."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "1acc35a7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>United States</th>\n",
       "      <td>36977</td>\n",
       "      <td>0.108797</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Unknown</th>\n",
       "      <td>13882</td>\n",
       "      <td>0.096600</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>China</th>\n",
       "      <td>7595</td>\n",
       "      <td>0.094799</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Japan</th>\n",
       "      <td>4622</td>\n",
       "      <td>0.109693</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United Kingdom</th>\n",
       "      <td>2869</td>\n",
       "      <td>0.117114</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Korea Republic of</th>\n",
       "      <td>2644</td>\n",
       "      <td>0.103253</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Germany</th>\n",
       "      <td>2311</td>\n",
       "      <td>0.082215</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>France</th>\n",
       "      <td>1962</td>\n",
       "      <td>0.110601</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Canada</th>\n",
       "      <td>1911</td>\n",
       "      <td>0.130822</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Brazil</th>\n",
       "      <td>1884</td>\n",
       "      <td>0.105096</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Italy</th>\n",
       "      <td>1272</td>\n",
       "      <td>0.094340</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Australia</th>\n",
       "      <td>1188</td>\n",
       "      <td>0.106061</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Netherlands</th>\n",
       "      <td>1053</td>\n",
       "      <td>0.080722</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Russian Federation</th>\n",
       "      <td>1005</td>\n",
       "      <td>0.082587</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>India</th>\n",
       "      <td>812</td>\n",
       "      <td>0.130542</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    count  perc_fraud\n",
       "country                              \n",
       "United States       36977    0.108797\n",
       "Unknown             13882    0.096600\n",
       "China                7595    0.094799\n",
       "Japan                4622    0.109693\n",
       "United Kingdom       2869    0.117114\n",
       "Korea Republic of    2644    0.103253\n",
       "Germany              2311    0.082215\n",
       "France               1962    0.110601\n",
       "Canada               1911    0.130822\n",
       "Brazil               1884    0.105096\n",
       "Italy                1272    0.094340\n",
       "Australia            1188    0.106061\n",
       "Netherlands          1053    0.080722\n",
       "Russian Federation   1005    0.082587\n",
       "India                 812    0.130542"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_country.nlargest(15, 'count')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1120c76d",
   "metadata": {},
   "source": [
    "Here, the Pearson correlation coefficient is calculated on all of the numeric columns. There appears to be no relationship."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "535453a7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "user_id           0.001075\n",
       "purchase_value    0.003553\n",
       "age               0.006828\n",
       "ip_address       -0.003796\n",
       "class             1.000000\n",
       "Name: class, dtype: float64"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.corr()['class']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6dae9e97",
   "metadata": {},
   "source": [
    "We can still bin the numeric columns to see if there is any relationship. The highest priced bin has by far the lowest fraud while the 85-100 have significantly more."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "ca0ccaa6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>purchase_value</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>(5, 15]</th>\n",
       "      <td>11209</td>\n",
       "      <td>10.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(15, 25]</th>\n",
       "      <td>19118</td>\n",
       "      <td>11.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(25, 35]</th>\n",
       "      <td>19663</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(35, 45]</th>\n",
       "      <td>17333</td>\n",
       "      <td>10.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(45, 55]</th>\n",
       "      <td>13019</td>\n",
       "      <td>10.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(55, 65]</th>\n",
       "      <td>8308</td>\n",
       "      <td>11.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(65, 75]</th>\n",
       "      <td>4331</td>\n",
       "      <td>10.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(75, 85]</th>\n",
       "      <td>1894</td>\n",
       "      <td>9.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(85, 95]</th>\n",
       "      <td>778</td>\n",
       "      <td>15.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(95, 200]</th>\n",
       "      <td>347</td>\n",
       "      <td>7.2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                count  perc_fraud\n",
       "purchase_value                   \n",
       "(5, 15]         11209        10.3\n",
       "(15, 25]        19118        11.1\n",
       "(25, 35]        19663        10.0\n",
       "(35, 45]        17333        10.3\n",
       "(45, 55]        13019        10.7\n",
       "(55, 65]         8308        11.9\n",
       "(65, 75]         4331        10.2\n",
       "(75, 85]         1894         9.2\n",
       "(85, 95]          778        15.6\n",
       "(95, 200]         347         7.2"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = pd.cut(df_train['purchase_value'], bins=list(range(5, 105, 10)) + [200])\n",
    "df_temp = df_train.groupby(g).agg(count=('class', 'size'), perc_fraud=('class', 'mean'))\n",
    "df_temp['perc_fraud'] = df_temp['perc_fraud'].round(3) * 100\n",
    "df_temp"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6dd5424",
   "metadata": {},
   "source": [
    "Again, not much signal with age."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "ed119afa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>age</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>(15, 20]</th>\n",
       "      <td>5751</td>\n",
       "      <td>10.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(20, 25]</th>\n",
       "      <td>14321</td>\n",
       "      <td>10.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(25, 30]</th>\n",
       "      <td>19323</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(30, 35]</th>\n",
       "      <td>20586</td>\n",
       "      <td>10.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(35, 40]</th>\n",
       "      <td>16942</td>\n",
       "      <td>11.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(40, 45]</th>\n",
       "      <td>10735</td>\n",
       "      <td>10.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(45, 50]</th>\n",
       "      <td>5314</td>\n",
       "      <td>9.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(50, 55]</th>\n",
       "      <td>2159</td>\n",
       "      <td>11.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(55, 60]</th>\n",
       "      <td>668</td>\n",
       "      <td>11.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(60, 65]</th>\n",
       "      <td>177</td>\n",
       "      <td>15.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(65, 100]</th>\n",
       "      <td>24</td>\n",
       "      <td>4.2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           count  perc_fraud\n",
       "age                         \n",
       "(15, 20]    5751        10.3\n",
       "(20, 25]   14321        10.3\n",
       "(25, 30]   19323        10.0\n",
       "(30, 35]   20586        10.9\n",
       "(35, 40]   16942        11.3\n",
       "(40, 45]   10735        10.8\n",
       "(45, 50]    5314         9.5\n",
       "(50, 55]    2159        11.3\n",
       "(55, 60]     668        11.4\n",
       "(60, 65]     177        15.3\n",
       "(65, 100]     24         4.2"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = pd.cut(df_train['age'], bins=list(range(15, 70, 5)) + [100])\n",
    "df_temp = df_train.groupby(g).agg(count=('class', 'size'), perc_fraud=('class', 'mean'))\n",
    "df_temp['perc_fraud'] = df_temp['perc_fraud'].round(3) * 100\n",
    "df_temp"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "524917e5",
   "metadata": {},
   "source": [
    "We can even check the user_id to see if there is signal there. The results are what we would expect from randomness."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "b25cc1eb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "user_id\n",
       "(8177.98, 12054.85]       0.130208\n",
       "(364266.09, 368258.32]    0.128125\n",
       "(375982.12, 380034.25]    0.128125\n",
       "(396081.03, 400000.0]     0.128125\n",
       "(168224.74, 172349.71]    0.126042\n",
       "Name: class, dtype: float64"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.groupby(pd.qcut(df_train['user_id'], 100))['class'].mean().nlargest()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a5c9956",
   "metadata": {},
   "source": [
    "## Multivariate exploration\n",
    "\n",
    "We can look at different combinations of the categorical variables to see if there is any signal. We write a function to calculate the percent fraud for different groupings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "0a0c31f3",
   "metadata": {},
   "outputs": [],
   "source": [
    "def fraud_group(df, cols, min_count=50, n=10):\n",
    "    df = df.groupby(cols) \\\n",
    "           .agg(count=('class', 'size'), \n",
    "                perc_fraud=('class', 'mean'), \n",
    "                avg_price=('purchase_value', 'mean'))\n",
    "    df['perc_fraud'] = df['perc_fraud'].round(3) * 100\n",
    "    return df.query('count > @min_count').nlargest(n, 'perc_fraud')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8701662c",
   "metadata": {},
   "source": [
    "Grouping by browser, source, and sex reveals only a few combinations with small signal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "5cbd218e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "      <th>avg_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>browser</th>\n",
       "      <th>source</th>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Opera</th>\n",
       "      <th>Direct</th>\n",
       "      <th>M</th>\n",
       "      <td>285</td>\n",
       "      <td>14.7</td>\n",
       "      <td>38.007018</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Chrome</th>\n",
       "      <th rowspan=\"2\" valign=\"top\">Direct</th>\n",
       "      <th>M</th>\n",
       "      <td>4675</td>\n",
       "      <td>13.8</td>\n",
       "      <td>36.823316</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>3401</td>\n",
       "      <td>13.4</td>\n",
       "      <td>36.905616</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>IE</th>\n",
       "      <th>Direct</th>\n",
       "      <th>M</th>\n",
       "      <td>2751</td>\n",
       "      <td>12.5</td>\n",
       "      <td>37.643402</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FireFox</th>\n",
       "      <th>SEO</th>\n",
       "      <th>F</th>\n",
       "      <td>2630</td>\n",
       "      <td>12.2</td>\n",
       "      <td>35.891635</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Safari</th>\n",
       "      <th>Ads</th>\n",
       "      <th>M</th>\n",
       "      <td>3672</td>\n",
       "      <td>11.9</td>\n",
       "      <td>36.590959</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Opera</th>\n",
       "      <th>SEO</th>\n",
       "      <th>F</th>\n",
       "      <td>418</td>\n",
       "      <td>11.5</td>\n",
       "      <td>38.011962</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Safari</th>\n",
       "      <th>SEO</th>\n",
       "      <th>F</th>\n",
       "      <td>2814</td>\n",
       "      <td>11.5</td>\n",
       "      <td>37.046908</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Chrome</th>\n",
       "      <th>Ads</th>\n",
       "      <th>F</th>\n",
       "      <td>6411</td>\n",
       "      <td>11.4</td>\n",
       "      <td>37.026205</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FireFox</th>\n",
       "      <th>Ads</th>\n",
       "      <th>F</th>\n",
       "      <td>2636</td>\n",
       "      <td>11.2</td>\n",
       "      <td>37.750759</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    count  perc_fraud  avg_price\n",
       "browser source sex                              \n",
       "Opera   Direct M      285        14.7  38.007018\n",
       "Chrome  Direct M     4675        13.8  36.823316\n",
       "               F     3401        13.4  36.905616\n",
       "IE      Direct M     2751        12.5  37.643402\n",
       "FireFox SEO    F     2630        12.2  35.891635\n",
       "Safari  Ads    M     3672        11.9  36.590959\n",
       "Opera   SEO    F      418        11.5  38.011962\n",
       "Safari  SEO    F     2814        11.5  37.046908\n",
       "Chrome  Ads    F     6411        11.4  37.026205\n",
       "FireFox Ads    F     2636        11.2  37.750759"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fraud_group(df_train, ['browser', 'source', 'sex'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fce371f3",
   "metadata": {},
   "source": [
    "Adding in country, there are some combinations that have much higher fraud, though the overall counts are low."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "8440a482",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "      <th>avg_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>country</th>\n",
       "      <th>browser</th>\n",
       "      <th>source</th>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Denmark</th>\n",
       "      <th>Chrome</th>\n",
       "      <th>SEO</th>\n",
       "      <th>M</th>\n",
       "      <td>53</td>\n",
       "      <td>49.1</td>\n",
       "      <td>32.264151</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Canada</th>\n",
       "      <th>Safari</th>\n",
       "      <th>SEO</th>\n",
       "      <th>M</th>\n",
       "      <td>101</td>\n",
       "      <td>39.6</td>\n",
       "      <td>31.910891</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sweden</th>\n",
       "      <th>Chrome</th>\n",
       "      <th>Ads</th>\n",
       "      <th>F</th>\n",
       "      <td>57</td>\n",
       "      <td>33.3</td>\n",
       "      <td>33.789474</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Australia</th>\n",
       "      <th>FireFox</th>\n",
       "      <th>SEO</th>\n",
       "      <th>F</th>\n",
       "      <td>51</td>\n",
       "      <td>31.4</td>\n",
       "      <td>42.764706</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Safari</th>\n",
       "      <th>Ads</th>\n",
       "      <th>M</th>\n",
       "      <td>51</td>\n",
       "      <td>31.4</td>\n",
       "      <td>36.588235</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Canada</th>\n",
       "      <th>Chrome</th>\n",
       "      <th>Direct</th>\n",
       "      <th>F</th>\n",
       "      <td>80</td>\n",
       "      <td>31.2</td>\n",
       "      <td>42.462500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FireFox</th>\n",
       "      <th>Ads</th>\n",
       "      <th>F</th>\n",
       "      <td>71</td>\n",
       "      <td>29.6</td>\n",
       "      <td>43.915493</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Netherlands</th>\n",
       "      <th>Safari</th>\n",
       "      <th>Ads</th>\n",
       "      <th>M</th>\n",
       "      <td>54</td>\n",
       "      <td>29.6</td>\n",
       "      <td>35.740741</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United Kingdom</th>\n",
       "      <th>Safari</th>\n",
       "      <th>SEO</th>\n",
       "      <th>F</th>\n",
       "      <td>100</td>\n",
       "      <td>27.0</td>\n",
       "      <td>37.600000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mexico</th>\n",
       "      <th>Chrome</th>\n",
       "      <th>Ads</th>\n",
       "      <th>F</th>\n",
       "      <td>61</td>\n",
       "      <td>26.2</td>\n",
       "      <td>41.196721</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                   count  perc_fraud  avg_price\n",
       "country        browser source sex                              \n",
       "Denmark        Chrome  SEO    M       53        49.1  32.264151\n",
       "Canada         Safari  SEO    M      101        39.6  31.910891\n",
       "Sweden         Chrome  Ads    F       57        33.3  33.789474\n",
       "Australia      FireFox SEO    F       51        31.4  42.764706\n",
       "               Safari  Ads    M       51        31.4  36.588235\n",
       "Canada         Chrome  Direct F       80        31.2  42.462500\n",
       "               FireFox Ads    F       71        29.6  43.915493\n",
       "Netherlands    Safari  Ads    M       54        29.6  35.740741\n",
       "United Kingdom Safari  SEO    F      100        27.0  37.600000\n",
       "Mexico         Chrome  Ads    F       61        26.2  41.196721"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fraud_group(df_train, ['country', 'browser', 'source', 'sex'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3f6e611",
   "metadata": {},
   "source": [
    "We can use the bins for age and purchase value as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "952a8e2a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "      <th>avg_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>browser</th>\n",
       "      <th>source</th>\n",
       "      <th>age</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>FireFox</th>\n",
       "      <th>SEO</th>\n",
       "      <th>(55, 60]</th>\n",
       "      <td>62</td>\n",
       "      <td>38.7</td>\n",
       "      <td>34.951613</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Safari</th>\n",
       "      <th>Ads</th>\n",
       "      <th>(50, 55]</th>\n",
       "      <td>168</td>\n",
       "      <td>26.2</td>\n",
       "      <td>37.946429</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Chrome</th>\n",
       "      <th>Direct</th>\n",
       "      <th>(55, 60]</th>\n",
       "      <td>53</td>\n",
       "      <td>20.8</td>\n",
       "      <td>45.396226</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FireFox</th>\n",
       "      <th>SEO</th>\n",
       "      <th>(50, 55]</th>\n",
       "      <td>164</td>\n",
       "      <td>20.1</td>\n",
       "      <td>35.469512</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">Opera</th>\n",
       "      <th>SEO</th>\n",
       "      <th>(20, 25]</th>\n",
       "      <td>170</td>\n",
       "      <td>20.0</td>\n",
       "      <td>36.558824</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ads</th>\n",
       "      <th>(20, 25]</th>\n",
       "      <td>145</td>\n",
       "      <td>19.3</td>\n",
       "      <td>34.303448</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Direct</th>\n",
       "      <th>(30, 35]</th>\n",
       "      <td>114</td>\n",
       "      <td>17.5</td>\n",
       "      <td>39.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FireFox</th>\n",
       "      <th>SEO</th>\n",
       "      <th>(15, 20]</th>\n",
       "      <td>403</td>\n",
       "      <td>16.6</td>\n",
       "      <td>35.377171</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Chrome</th>\n",
       "      <th>SEO</th>\n",
       "      <th>(55, 60]</th>\n",
       "      <td>104</td>\n",
       "      <td>16.3</td>\n",
       "      <td>37.644231</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Direct</th>\n",
       "      <th>(35, 40]</th>\n",
       "      <td>1521</td>\n",
       "      <td>16.0</td>\n",
       "      <td>36.541749</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         count  perc_fraud  avg_price\n",
       "browser source age                                   \n",
       "FireFox SEO    (55, 60]     62        38.7  34.951613\n",
       "Safari  Ads    (50, 55]    168        26.2  37.946429\n",
       "Chrome  Direct (55, 60]     53        20.8  45.396226\n",
       "FireFox SEO    (50, 55]    164        20.1  35.469512\n",
       "Opera   SEO    (20, 25]    170        20.0  36.558824\n",
       "        Ads    (20, 25]    145        19.3  34.303448\n",
       "        Direct (30, 35]    114        17.5  39.000000\n",
       "FireFox SEO    (15, 20]    403        16.6  35.377171\n",
       "Chrome  SEO    (55, 60]    104        16.3  37.644231\n",
       "        Direct (35, 40]   1521        16.0  36.541749"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = pd.cut(df_train['age'], bins=list(range(15, 70, 5)) + [100])\n",
    "fraud_group(df_train, ['browser', 'source', g])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58fbb5a0",
   "metadata": {},
   "source": [
    "## Explore signup and purchase time\n",
    "\n",
    "Date columns require a different approach for analysis.\n",
    "\n",
    "### Seconds until purchase\n",
    "\n",
    "We begin by finding the seconds until purchase."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "a902565d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1.0\n",
       "1    1.0\n",
       "2    1.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "secs_to_purchase = (df_train['purchase_time'] - df_train['signup_time']).dt.total_seconds()\n",
    "secs_to_purchase.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c4bc1c5",
   "metadata": {},
   "source": [
    "Interestingly, the first few rows all had purchases after one second and all were fraudulent. Let's select all purchases that happened in one second."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "d6e354f1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "mean       1.0\n",
       "size    6021.0\n",
       "Name: class, dtype: float64"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "filt = secs_to_purchase == 1\n",
    "df_train.loc[filt, 'class'].agg(['mean', 'size'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8de77474",
   "metadata": {},
   "source": [
    "Remarkably, every single transaction that took place in 1 second was marked as fraud."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "203ebd46",
   "metadata": {},
   "source": [
    "### Separating out 1 second transactions\n",
    "\n",
    "Since we found a one to one mapping between a feature of our dataset and fraud, let's filter out this data into its own DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "5e817dd1",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_one_second = df_train[filt].reset_index(drop=True)\n",
    "df_remaining = df_train[~filt].reset_index(drop=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a1eda59",
   "metadata": {},
   "source": [
    "### Metadata for new datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "c245e6ba",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(6021, 12)"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_one_second.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "bfd80799",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(89979, 12)"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_remaining.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a73f23b",
   "metadata": {},
   "source": [
    "Percent of \"Unknown\" countries shows they are similar."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "ccb7cd0d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "12.572662348447103"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(df_one_second['country'] == \"Unknown\").mean() * 100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "fb5496ca",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "14.586736905277897"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(df_remaining['country'] == \"Unknown\").mean() * 100"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c90e87c8",
   "metadata": {},
   "source": [
    "Number of unique values by column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "3741de6e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "user_id           6021\n",
       "signup_time       6021\n",
       "purchase_time     6021\n",
       "purchase_value      80\n",
       "device_id          758\n",
       "source               3\n",
       "browser              5\n",
       "sex                  2\n",
       "age                 44\n",
       "ip_address         758\n",
       "class                1\n",
       "country             60\n",
       "dtype: int64"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_one_second.nunique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "0304855b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "user_id           89979\n",
       "signup_time       89979\n",
       "purchase_time     89785\n",
       "purchase_value      118\n",
       "device_id         87735\n",
       "source                3\n",
       "browser               5\n",
       "sex                   2\n",
       "age                  56\n",
       "ip_address        89979\n",
       "class                 2\n",
       "country             172\n",
       "dtype: int64"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_remaining.nunique()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c98ad08f",
   "metadata": {},
   "source": [
    "### Repeating IP Addresses\n",
    "\n",
    "From above, the one second purchase data has lots of repeating IP addresses."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "b51744d1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3.874758e+09    18\n",
       "1.797069e+09    16\n",
       "2.586669e+09    16\n",
       "1.502818e+09    16\n",
       "5.760609e+08    16\n",
       "Name: ip_address, dtype: int64"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_one_second['ip_address'].value_counts().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab15a0a1",
   "metadata": {},
   "source": [
    "In fact, all of the repeating IP addresses are in the one second data. None are in the remaining."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "ebf6a3f1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_remaining['ip_address'].is_unique"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58488111",
   "metadata": {},
   "source": [
    "### Distribution of transactions over time\n",
    "\n",
    "Let's look at the number of transaction per month."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "eb6d91e8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "signup_time\n",
       "2015-01-31    6021\n",
       "Freq: M, dtype: int64"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_one_second.resample('M', on='signup_time').size()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b666d7b7",
   "metadata": {},
   "source": [
    "All of the one second transactions took place in January with about the same number per day."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "id": "2d4d2aae",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "signup_time\n",
       "2015-01-01    470\n",
       "2015-01-02    567\n",
       "2015-01-03    427\n",
       "2015-01-04    495\n",
       "2015-01-05    430\n",
       "2015-01-06    509\n",
       "2015-01-07    597\n",
       "2015-01-08    516\n",
       "2015-01-09    416\n",
       "2015-01-10    501\n",
       "2015-01-11    516\n",
       "2015-01-12    506\n",
       "2015-01-13     71\n",
       "Freq: D, dtype: int64"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_one_second.resample('D', on='signup_time').size()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f57124d0",
   "metadata": {},
   "source": [
    "All of the other transactions appear to be fairly evenly distributed across the months."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "id": "b070bfe0",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "signup_time\n",
       "2015-01    15310\n",
       "2015-02    13907\n",
       "2015-03    15558\n",
       "2015-04    15072\n",
       "2015-05    15369\n",
       "2015-06    14763\n",
       "Freq: M, dtype: int64"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_remaining.resample('M', on='signup_time', kind='period').size()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee8bd5b6",
   "metadata": {},
   "source": [
    "## New baseline\n",
    "\n",
    "Now that we've separated out the one second transaction, we'll recalculate the percent fraudulent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "id": "ad5ade55",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.04594405361250958"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_remaining['class'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0b1e66c",
   "metadata": {},
   "source": [
    "We'll focus on building a model on the remaining dataset and look for groups that have significantly more fraud than **4.6%**."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f130726",
   "metadata": {},
   "source": [
    "### Group again\n",
    "\n",
    "Let's look at some of the same groups from above with our new filtered data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "3069ee49",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "      <th>avg_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>52477</td>\n",
       "      <td>4.7</td>\n",
       "      <td>36.863616</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>37502</td>\n",
       "      <td>4.5</td>\n",
       "      <td>36.844888</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     count  perc_fraud  avg_price\n",
       "sex                              \n",
       "M    52477         4.7  36.863616\n",
       "F    37502         4.5  36.844888"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fraud_group(df_remaining, 'sex')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "id": "fec8ddc7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "      <th>avg_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>source</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Direct</th>\n",
       "      <td>18066</td>\n",
       "      <td>5.5</td>\n",
       "      <td>36.851378</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ads</th>\n",
       "      <td>35552</td>\n",
       "      <td>4.4</td>\n",
       "      <td>36.898796</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SEO</th>\n",
       "      <td>36361</td>\n",
       "      <td>4.3</td>\n",
       "      <td>36.815984</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        count  perc_fraud  avg_price\n",
       "source                              \n",
       "Direct  18066         5.5  36.851378\n",
       "Ads     35552         4.4  36.898796\n",
       "SEO     36361         4.3  36.815984"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fraud_group(df_remaining, 'source')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "c516c766",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "      <th>avg_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>browser</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Safari</th>\n",
       "      <td>14778</td>\n",
       "      <td>4.7</td>\n",
       "      <td>36.999865</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Chrome</th>\n",
       "      <td>36278</td>\n",
       "      <td>4.6</td>\n",
       "      <td>36.967914</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Opera</th>\n",
       "      <td>2243</td>\n",
       "      <td>4.6</td>\n",
       "      <td>36.317432</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FireFox</th>\n",
       "      <td>14679</td>\n",
       "      <td>4.5</td>\n",
       "      <td>36.668779</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>IE</th>\n",
       "      <td>22001</td>\n",
       "      <td>4.5</td>\n",
       "      <td>36.753875</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         count  perc_fraud  avg_price\n",
       "browser                              \n",
       "Safari   14778         4.7  36.999865\n",
       "Chrome   36278         4.6  36.967914\n",
       "Opera     2243         4.6  36.317432\n",
       "FireFox  14679         4.5  36.668779\n",
       "IE       22001         4.5  36.753875"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fraud_group(df_remaining, 'browser')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "id": "fde5afa1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "      <th>avg_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>66</td>\n",
       "      <td>12.1</td>\n",
       "      <td>35.348485</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Belgium</th>\n",
       "      <td>230</td>\n",
       "      <td>8.3</td>\n",
       "      <td>36.556522</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Peru</th>\n",
       "      <td>52</td>\n",
       "      <td>7.7</td>\n",
       "      <td>38.403846</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Philippines</th>\n",
       "      <td>95</td>\n",
       "      <td>6.3</td>\n",
       "      <td>35.157895</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Colombia</th>\n",
       "      <td>371</td>\n",
       "      <td>6.2</td>\n",
       "      <td>38.075472</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Hong Kong</th>\n",
       "      <td>273</td>\n",
       "      <td>6.2</td>\n",
       "      <td>38.087912</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Saudi Arabia</th>\n",
       "      <td>146</td>\n",
       "      <td>6.2</td>\n",
       "      <td>36.479452</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Argentina</th>\n",
       "      <td>380</td>\n",
       "      <td>6.1</td>\n",
       "      <td>36.528947</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Chile</th>\n",
       "      <td>235</td>\n",
       "      <td>6.0</td>\n",
       "      <td>35.195745</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Malaysia</th>\n",
       "      <td>136</td>\n",
       "      <td>5.9</td>\n",
       "      <td>34.772059</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              count  perc_fraud  avg_price\n",
       "country                                   \n",
       "Algeria          66        12.1  35.348485\n",
       "Belgium         230         8.3  36.556522\n",
       "Peru             52         7.7  38.403846\n",
       "Philippines      95         6.3  35.157895\n",
       "Colombia        371         6.2  38.075472\n",
       "Hong Kong       273         6.2  38.087912\n",
       "Saudi Arabia    146         6.2  36.479452\n",
       "Argentina       380         6.1  36.528947\n",
       "Chile           235         6.0  35.195745\n",
       "Malaysia        136         5.9  34.772059"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fraud_group(df_remaining, 'country')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "id": "fbd7c576",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "      <th>avg_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>purchase_value</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>(90, 95]</th>\n",
       "      <td>247</td>\n",
       "      <td>6.9</td>\n",
       "      <td>92.793522</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(60, 65]</th>\n",
       "      <td>3348</td>\n",
       "      <td>5.2</td>\n",
       "      <td>62.886201</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(80, 85]</th>\n",
       "      <td>725</td>\n",
       "      <td>5.2</td>\n",
       "      <td>82.739310</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(75, 80]</th>\n",
       "      <td>1088</td>\n",
       "      <td>5.1</td>\n",
       "      <td>77.846507</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(65, 70]</th>\n",
       "      <td>2395</td>\n",
       "      <td>4.9</td>\n",
       "      <td>67.863466</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(25, 30]</th>\n",
       "      <td>9515</td>\n",
       "      <td>4.8</td>\n",
       "      <td>27.999054</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(45, 50]</th>\n",
       "      <td>6762</td>\n",
       "      <td>4.7</td>\n",
       "      <td>47.953268</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(50, 55]</th>\n",
       "      <td>5443</td>\n",
       "      <td>4.7</td>\n",
       "      <td>52.896381</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(10, 15]</th>\n",
       "      <td>7759</td>\n",
       "      <td>4.6</td>\n",
       "      <td>13.082098</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(70, 75]</th>\n",
       "      <td>1689</td>\n",
       "      <td>4.6</td>\n",
       "      <td>72.834221</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                count  perc_fraud  avg_price\n",
       "purchase_value                              \n",
       "(90, 95]          247         6.9  92.793522\n",
       "(60, 65]         3348         5.2  62.886201\n",
       "(80, 85]          725         5.2  82.739310\n",
       "(75, 80]         1088         5.1  77.846507\n",
       "(65, 70]         2395         4.9  67.863466\n",
       "(25, 30]         9515         4.8  27.999054\n",
       "(45, 50]         6762         4.7  47.953268\n",
       "(50, 55]         5443         4.7  52.896381\n",
       "(10, 15]         7759         4.6  13.082098\n",
       "(70, 75]         1689         4.6  72.834221"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = pd.cut(df_remaining['purchase_value'], bins=list(range(5, 105, 5)) + [200])\n",
    "fraud_group(df_remaining, g)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "id": "8cacd393",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "      <th>avg_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>purchase_value</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>(90, 95]</th>\n",
       "      <td>247</td>\n",
       "      <td>6.9</td>\n",
       "      <td>92.793522</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(60, 65]</th>\n",
       "      <td>3348</td>\n",
       "      <td>5.2</td>\n",
       "      <td>62.886201</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(80, 85]</th>\n",
       "      <td>725</td>\n",
       "      <td>5.2</td>\n",
       "      <td>82.739310</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(75, 80]</th>\n",
       "      <td>1088</td>\n",
       "      <td>5.1</td>\n",
       "      <td>77.846507</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(65, 70]</th>\n",
       "      <td>2395</td>\n",
       "      <td>4.9</td>\n",
       "      <td>67.863466</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(25, 30]</th>\n",
       "      <td>9515</td>\n",
       "      <td>4.8</td>\n",
       "      <td>27.999054</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(45, 50]</th>\n",
       "      <td>6762</td>\n",
       "      <td>4.7</td>\n",
       "      <td>47.953268</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(50, 55]</th>\n",
       "      <td>5443</td>\n",
       "      <td>4.7</td>\n",
       "      <td>52.896381</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(10, 15]</th>\n",
       "      <td>7759</td>\n",
       "      <td>4.6</td>\n",
       "      <td>13.082098</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(70, 75]</th>\n",
       "      <td>1689</td>\n",
       "      <td>4.6</td>\n",
       "      <td>72.834221</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                count  perc_fraud  avg_price\n",
       "purchase_value                              \n",
       "(90, 95]          247         6.9  92.793522\n",
       "(60, 65]         3348         5.2  62.886201\n",
       "(80, 85]          725         5.2  82.739310\n",
       "(75, 80]         1088         5.1  77.846507\n",
       "(65, 70]         2395         4.9  67.863466\n",
       "(25, 30]         9515         4.8  27.999054\n",
       "(45, 50]         6762         4.7  47.953268\n",
       "(50, 55]         5443         4.7  52.896381\n",
       "(10, 15]         7759         4.6  13.082098\n",
       "(70, 75]         1689         4.6  72.834221"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = pd.cut(df_remaining['purchase_value'], bins=list(range(5, 105, 5)) + [200])\n",
    "fraud_group(df_remaining, g)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "id": "36fb0bda",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "      <th>avg_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>age</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>(60, 65]</th>\n",
       "      <td>159</td>\n",
       "      <td>5.7</td>\n",
       "      <td>36.402516</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(20, 25]</th>\n",
       "      <td>13503</td>\n",
       "      <td>4.9</td>\n",
       "      <td>36.962675</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(35, 40]</th>\n",
       "      <td>15790</td>\n",
       "      <td>4.8</td>\n",
       "      <td>37.056618</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(25, 30]</th>\n",
       "      <td>18241</td>\n",
       "      <td>4.6</td>\n",
       "      <td>36.813333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(15, 20]</th>\n",
       "      <td>5403</td>\n",
       "      <td>4.5</td>\n",
       "      <td>36.474366</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(40, 45]</th>\n",
       "      <td>10018</td>\n",
       "      <td>4.5</td>\n",
       "      <td>36.906069</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(45, 50]</th>\n",
       "      <td>5032</td>\n",
       "      <td>4.5</td>\n",
       "      <td>36.864269</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(30, 35]</th>\n",
       "      <td>19186</td>\n",
       "      <td>4.4</td>\n",
       "      <td>36.738820</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(50, 55]</th>\n",
       "      <td>2005</td>\n",
       "      <td>4.4</td>\n",
       "      <td>36.889776</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(55, 60]</th>\n",
       "      <td>618</td>\n",
       "      <td>4.2</td>\n",
       "      <td>36.708738</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          count  perc_fraud  avg_price\n",
       "age                                   \n",
       "(60, 65]    159         5.7  36.402516\n",
       "(20, 25]  13503         4.9  36.962675\n",
       "(35, 40]  15790         4.8  37.056618\n",
       "(25, 30]  18241         4.6  36.813333\n",
       "(15, 20]   5403         4.5  36.474366\n",
       "(40, 45]  10018         4.5  36.906069\n",
       "(45, 50]   5032         4.5  36.864269\n",
       "(30, 35]  19186         4.4  36.738820\n",
       "(50, 55]   2005         4.4  36.889776\n",
       "(55, 60]    618         4.2  36.708738"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = pd.cut(df_remaining['age'], bins=list(range(15, 70, 5)) + [100])\n",
    "fraud_group(df_remaining, g)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "15cb85a4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "      <th>avg_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>country</th>\n",
       "      <th>browser</th>\n",
       "      <th>source</th>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Canada</th>\n",
       "      <th>Chrome</th>\n",
       "      <th>Direct</th>\n",
       "      <th>F</th>\n",
       "      <td>62</td>\n",
       "      <td>11.3</td>\n",
       "      <td>38.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Korea Republic of</th>\n",
       "      <th>Chrome</th>\n",
       "      <th>Direct</th>\n",
       "      <th>F</th>\n",
       "      <td>104</td>\n",
       "      <td>10.6</td>\n",
       "      <td>35.221154</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United Kingdom</th>\n",
       "      <th>IE</th>\n",
       "      <th>Ads</th>\n",
       "      <th>F</th>\n",
       "      <td>115</td>\n",
       "      <td>10.4</td>\n",
       "      <td>38.513043</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Japan</th>\n",
       "      <th>Safari</th>\n",
       "      <th>SEO</th>\n",
       "      <th>F</th>\n",
       "      <td>141</td>\n",
       "      <td>9.9</td>\n",
       "      <td>37.241135</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Australia</th>\n",
       "      <th>Chrome</th>\n",
       "      <th>Ads</th>\n",
       "      <th>M</th>\n",
       "      <td>99</td>\n",
       "      <td>9.1</td>\n",
       "      <td>35.959596</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Brazil</th>\n",
       "      <th>IE</th>\n",
       "      <th>SEO</th>\n",
       "      <th>M</th>\n",
       "      <td>99</td>\n",
       "      <td>9.1</td>\n",
       "      <td>39.444444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United Kingdom</th>\n",
       "      <th>Chrome</th>\n",
       "      <th>Direct</th>\n",
       "      <th>F</th>\n",
       "      <td>77</td>\n",
       "      <td>9.1</td>\n",
       "      <td>35.766234</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>France</th>\n",
       "      <th>IE</th>\n",
       "      <th>Direct</th>\n",
       "      <th>M</th>\n",
       "      <td>56</td>\n",
       "      <td>8.9</td>\n",
       "      <td>36.839286</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sweden</th>\n",
       "      <th>Chrome</th>\n",
       "      <th>SEO</th>\n",
       "      <th>M</th>\n",
       "      <td>58</td>\n",
       "      <td>8.6</td>\n",
       "      <td>36.017241</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mexico</th>\n",
       "      <th>Chrome</th>\n",
       "      <th>SEO</th>\n",
       "      <th>M</th>\n",
       "      <td>59</td>\n",
       "      <td>8.5</td>\n",
       "      <td>40.084746</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                      count  perc_fraud  avg_price\n",
       "country           browser source sex                              \n",
       "Canada            Chrome  Direct F       62        11.3  38.500000\n",
       "Korea Republic of Chrome  Direct F      104        10.6  35.221154\n",
       "United Kingdom    IE      Ads    F      115        10.4  38.513043\n",
       "Japan             Safari  SEO    F      141         9.9  37.241135\n",
       "Australia         Chrome  Ads    M       99         9.1  35.959596\n",
       "Brazil            IE      SEO    M       99         9.1  39.444444\n",
       "United Kingdom    Chrome  Direct F       77         9.1  35.766234\n",
       "France            IE      Direct M       56         8.9  36.839286\n",
       "Sweden            Chrome  SEO    M       58         8.6  36.017241\n",
       "Mexico            Chrome  SEO    M       59         8.5  40.084746"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fraud_group(df_remaining, ['country', 'browser', 'source', 'sex'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "630d0b8b",
   "metadata": {},
   "source": [
    "### More date features\n",
    "\n",
    "New features can be generated for each of the date columns such as:\n",
    "\n",
    "* day name\n",
    "* month\n",
    "* hour\n",
    "* minute\n",
    "\n",
    "We begin with looking at the day name of the signup time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "id": "1a42dd9d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    Thursday\n",
       "1    Thursday\n",
       "2    Thursday\n",
       "3    Thursday\n",
       "4    Thursday\n",
       "Name: signup_time, dtype: object"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = df_remaining['signup_time'].dt.day_name()\n",
    "g.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fea65ea6",
   "metadata": {},
   "source": [
    "Calculating fraud by day name doesn't appear to provide much signal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "id": "7f0a4095",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "      <th>avg_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>signup_time</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Monday</th>\n",
       "      <td>12971</td>\n",
       "      <td>4.8</td>\n",
       "      <td>36.724771</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Saturday</th>\n",
       "      <td>12840</td>\n",
       "      <td>4.7</td>\n",
       "      <td>37.023209</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Wednesday</th>\n",
       "      <td>12239</td>\n",
       "      <td>4.7</td>\n",
       "      <td>37.027535</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sunday</th>\n",
       "      <td>12942</td>\n",
       "      <td>4.6</td>\n",
       "      <td>36.864936</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Friday</th>\n",
       "      <td>13059</td>\n",
       "      <td>4.5</td>\n",
       "      <td>37.110575</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thursday</th>\n",
       "      <td>12972</td>\n",
       "      <td>4.5</td>\n",
       "      <td>36.612319</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tuesday</th>\n",
       "      <td>12956</td>\n",
       "      <td>4.4</td>\n",
       "      <td>36.636771</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             count  perc_fraud  avg_price\n",
       "signup_time                              \n",
       "Monday       12971         4.8  36.724771\n",
       "Saturday     12840         4.7  37.023209\n",
       "Wednesday    12239         4.7  37.027535\n",
       "Sunday       12942         4.6  36.864936\n",
       "Friday       13059         4.5  37.110575\n",
       "Thursday     12972         4.5  36.612319\n",
       "Tuesday      12956         4.4  36.636771"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fraud_group(df_remaining, g)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "368ed38e",
   "metadata": {},
   "source": [
    "Similarly, no signal for purchase time day name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "id": "6beeaa63",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "      <th>avg_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>purchase_time</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Saturday</th>\n",
       "      <td>12741</td>\n",
       "      <td>4.9</td>\n",
       "      <td>37.022526</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tuesday</th>\n",
       "      <td>12877</td>\n",
       "      <td>4.7</td>\n",
       "      <td>36.869923</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Wednesday</th>\n",
       "      <td>12798</td>\n",
       "      <td>4.7</td>\n",
       "      <td>36.948117</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sunday</th>\n",
       "      <td>13120</td>\n",
       "      <td>4.6</td>\n",
       "      <td>36.864787</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Friday</th>\n",
       "      <td>12817</td>\n",
       "      <td>4.5</td>\n",
       "      <td>36.868612</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thursday</th>\n",
       "      <td>12802</td>\n",
       "      <td>4.5</td>\n",
       "      <td>36.704812</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Monday</th>\n",
       "      <td>12824</td>\n",
       "      <td>4.2</td>\n",
       "      <td>36.712648</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               count  perc_fraud  avg_price\n",
       "purchase_time                              \n",
       "Saturday       12741         4.9  37.022526\n",
       "Tuesday        12877         4.7  36.869923\n",
       "Wednesday      12798         4.7  36.948117\n",
       "Sunday         13120         4.6  36.864787\n",
       "Friday         12817         4.5  36.868612\n",
       "Thursday       12802         4.5  36.704812\n",
       "Monday         12824         4.2  36.712648"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fraud_group(df_remaining, df_remaining['purchase_time'].dt.day_name())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b580706c",
   "metadata": {},
   "source": [
    "The hour does not seem to matter either."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "id": "88db34df",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "      <th>avg_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>purchase_time</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3770</td>\n",
       "      <td>5.2</td>\n",
       "      <td>36.646419</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>3741</td>\n",
       "      <td>4.9</td>\n",
       "      <td>37.130981</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>3693</td>\n",
       "      <td>4.9</td>\n",
       "      <td>36.985649</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3726</td>\n",
       "      <td>4.8</td>\n",
       "      <td>36.733763</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>3760</td>\n",
       "      <td>4.8</td>\n",
       "      <td>37.044947</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>3783</td>\n",
       "      <td>4.8</td>\n",
       "      <td>36.823685</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>3616</td>\n",
       "      <td>4.8</td>\n",
       "      <td>36.814159</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>3774</td>\n",
       "      <td>4.8</td>\n",
       "      <td>36.855856</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>3706</td>\n",
       "      <td>4.8</td>\n",
       "      <td>37.397733</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3649</td>\n",
       "      <td>4.7</td>\n",
       "      <td>37.221156</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               count  perc_fraud  avg_price\n",
       "purchase_time                              \n",
       "1               3770         5.2  36.646419\n",
       "7               3741         4.9  37.130981\n",
       "20              3693         4.9  36.985649\n",
       "0               3726         4.8  36.733763\n",
       "5               3760         4.8  37.044947\n",
       "8               3783         4.8  36.823685\n",
       "10              3616         4.8  36.814159\n",
       "17              3774         4.8  36.855856\n",
       "18              3706         4.8  37.397733\n",
       "4               3649         4.7  37.221156"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fraud_group(df_remaining, df_remaining['purchase_time'].dt.hour)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "id": "c3978c3b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>perc_fraud</th>\n",
       "      <th>avg_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>purchase_time</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>1478</td>\n",
       "      <td>6.2</td>\n",
       "      <td>36.802436</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1563</td>\n",
       "      <td>5.6</td>\n",
       "      <td>37.338452</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>1489</td>\n",
       "      <td>5.6</td>\n",
       "      <td>36.497649</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>1451</td>\n",
       "      <td>5.4</td>\n",
       "      <td>37.156444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>1487</td>\n",
       "      <td>5.4</td>\n",
       "      <td>36.845999</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>1505</td>\n",
       "      <td>5.4</td>\n",
       "      <td>36.476412</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>1500</td>\n",
       "      <td>5.3</td>\n",
       "      <td>36.733333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>1558</td>\n",
       "      <td>5.2</td>\n",
       "      <td>37.295892</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55</th>\n",
       "      <td>1537</td>\n",
       "      <td>5.2</td>\n",
       "      <td>37.245283</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58</th>\n",
       "      <td>1433</td>\n",
       "      <td>5.2</td>\n",
       "      <td>36.431263</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               count  perc_fraud  avg_price\n",
       "purchase_time                              \n",
       "24              1478         6.2  36.802436\n",
       "8               1563         5.6  37.338452\n",
       "19              1489         5.6  36.497649\n",
       "29              1451         5.4  37.156444\n",
       "38              1487         5.4  36.845999\n",
       "49              1505         5.4  36.476412\n",
       "11              1500         5.3  36.733333\n",
       "39              1558         5.2  37.295892\n",
       "55              1537         5.2  37.245283\n",
       "58              1433         5.2  36.431263"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fraud_group(df_remaining, df_remaining['purchase_time'].dt.minute)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "015171e7",
   "metadata": {},
   "source": [
    "### Device ID\n",
    "\n",
    "We now turn our attention to device_id. From the number of unique values, we know there are some that repeat. Let's look at those first."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "id": "e891976b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "WENNLJYHVVSCR    3\n",
       "CGLAEGEJMRFXY    3\n",
       "TBEXEPAUWGUWW    3\n",
       "KZYECBRGTWQDJ    3\n",
       "TUTIBAJWVRPPI    3\n",
       "Name: device_id, dtype: int64"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_remaining['device_id'].value_counts()[lambda x: x > 1].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2f20c54",
   "metadata": {},
   "source": [
    "The repeated device IDs are assigned to a variable. The `df_train` DataFrame is used to check for duplicates in the entire dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "id": "e5ebb3a4",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['ITUMJCKWEYNDD', 'KIPFSCNUGOLDP', 'EQYVNEGOFLAWK', 'ZUSVMDEZRBDTX',\n",
       "       'IXNWEKWJGNLNH', 'CDFXVYHOIHPYP', 'UFBULQADXSSOG', 'SDJQRPKXQFBED',\n",
       "       'IGKYVZDBEGALB', 'SUEKLSZWLASFR'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "repeat_device_id = df_train['device_id'].value_counts()[lambda x: x > 1].index\n",
    "repeat_device_id[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e80c6e26",
   "metadata": {},
   "source": [
    "All transactions for these repeats are placed in their own DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "id": "1af86fab",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>signup_time</th>\n",
       "      <th>purchase_time</th>\n",
       "      <th>purchase_value</th>\n",
       "      <th>device_id</th>\n",
       "      <th>source</th>\n",
       "      <th>browser</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>ip_address</th>\n",
       "      <th>class</th>\n",
       "      <th>country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>209275</td>\n",
       "      <td>2015-01-01 04:25:21</td>\n",
       "      <td>2015-01-28 10:39:35</td>\n",
       "      <td>57</td>\n",
       "      <td>AAAXXOZJRZRAO</td>\n",
       "      <td>Ads</td>\n",
       "      <td>FireFox</td>\n",
       "      <td>F</td>\n",
       "      <td>36</td>\n",
       "      <td>1.377849e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>187731</td>\n",
       "      <td>2015-01-12 23:29:37</td>\n",
       "      <td>2015-04-17 17:31:00</td>\n",
       "      <td>42</td>\n",
       "      <td>AANYBGQSWHRTK</td>\n",
       "      <td>SEO</td>\n",
       "      <td>Safari</td>\n",
       "      <td>M</td>\n",
       "      <td>29</td>\n",
       "      <td>2.707984e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>France</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>136233</td>\n",
       "      <td>2015-04-03 01:36:48</td>\n",
       "      <td>2015-07-18 11:12:37</td>\n",
       "      <td>32</td>\n",
       "      <td>ABPUTDOGTTISP</td>\n",
       "      <td>Direct</td>\n",
       "      <td>Chrome</td>\n",
       "      <td>M</td>\n",
       "      <td>35</td>\n",
       "      <td>3.001642e+09</td>\n",
       "      <td>1</td>\n",
       "      <td>Turkey</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>102967</td>\n",
       "      <td>2015-05-31 23:59:55</td>\n",
       "      <td>2015-09-22 14:22:00</td>\n",
       "      <td>65</td>\n",
       "      <td>ABPUTDOGTTISP</td>\n",
       "      <td>Ads</td>\n",
       "      <td>Chrome</td>\n",
       "      <td>F</td>\n",
       "      <td>37</td>\n",
       "      <td>2.282258e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>United States</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id         signup_time       purchase_time  purchase_value  \\\n",
       "0   209275 2015-01-01 04:25:21 2015-01-28 10:39:35              57   \n",
       "1   187731 2015-01-12 23:29:37 2015-04-17 17:31:00              42   \n",
       "2   136233 2015-04-03 01:36:48 2015-07-18 11:12:37              32   \n",
       "3   102967 2015-05-31 23:59:55 2015-09-22 14:22:00              65   \n",
       "\n",
       "       device_id  source  browser sex  age    ip_address  class  \\\n",
       "0  AAAXXOZJRZRAO     Ads  FireFox   F   36  1.377849e+09      0   \n",
       "1  AANYBGQSWHRTK     SEO   Safari   M   29  2.707984e+09      0   \n",
       "2  ABPUTDOGTTISP  Direct   Chrome   M   35  3.001642e+09      1   \n",
       "3  ABPUTDOGTTISP     Ads   Chrome   F   37  2.282258e+09      0   \n",
       "\n",
       "          country  \n",
       "0  United Kingdom  \n",
       "1          France  \n",
       "2          Turkey  \n",
       "3   United States  "
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_repeat_device = df_remaining.query('device_id in @repeat_device_id') \\\n",
    "                               .sort_values('device_id', ignore_index=True)\n",
    "df_repeat_device.head(4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "id": "f4c567cb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(5052, 12)"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_repeat_device.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e5b2fea",
   "metadata": {},
   "source": [
    "### Higher fraud for repeated device ID\n",
    "\n",
    "Overall, there is significantly more fraud for these transactions. An average of **21%** are fraud."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "id": "003c75f7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.21357878068091846"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_repeat_device['class'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec34a900",
   "metadata": {},
   "source": [
    "A feature is created to indicate the number of the transaction for the given device ID."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "id": "0dabe6b8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>device_id</th>\n",
       "      <th>device_ct</th>\n",
       "      <th>class</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAAXXOZJRZRAO</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AANYBGQSWHRTK</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>ABPUTDOGTTISP</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>ABPUTDOGTTISP</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ABWSNQWGCFARL</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>ABWSNQWGCFARL</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       device_id  device_ct  class\n",
       "0  AAAXXOZJRZRAO          1      0\n",
       "1  AANYBGQSWHRTK          1      0\n",
       "2  ABPUTDOGTTISP          1      1\n",
       "3  ABPUTDOGTTISP          2      0\n",
       "4  ABWSNQWGCFARL          1      1\n",
       "5  ABWSNQWGCFARL          2      0"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_repeat_device['device_ct'] = df_repeat_device.groupby('device_id').cumcount() + 1\n",
    "df_repeat_device[['device_id', 'device_ct', 'class']].head(6)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7121b5d",
   "metadata": {},
   "source": [
    "This enables us to test whether the first, second transaction has more probability of fraud. Below, we see that the second transaction from the same device ID has similar fraud as the first. Only a few have more than two transactions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "id": "1df272cc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>size</th>\n",
       "      <th>mean</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>device_ct</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2808</td>\n",
       "      <td>0.194444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2225</td>\n",
       "      <td>0.238202</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>19</td>\n",
       "      <td>0.157895</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           size      mean\n",
       "device_ct                \n",
       "1          2808  0.194444\n",
       "2          2225  0.238202\n",
       "3            19  0.157895"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_repeat_device.groupby('device_ct')['class'].agg(['size', 'mean'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38499d16",
   "metadata": {},
   "source": [
    "### Streaks of fraudulent transactions\n",
    "\n",
    "In this section, we'll look to see if fraudulent transactions occur in streaks. Our data is sorted by signup time, so we'll start with it. We shift the class column values down one which has the effect of looking at whether the very previous signup transaction was fraudulent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cefc78b9",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_remaining.groupby(df_remaining['class'].shift())['class'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ac53ee1",
   "metadata": {},
   "source": [
    "No signal is found. We can look at the previous 100 transactions and group by whether there are more or less than 5 frauds as a proxy for streaks. Again, no signal is detected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a3cb8d9b",
   "metadata": {},
   "outputs": [],
   "source": [
    "g = df_remaining['class'].rolling(100).sum().shift() > 5\n",
    "df_remaining.groupby(g)['class'].agg(['size', 'mean'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b7a38ed",
   "metadata": {},
   "source": [
    "Let's complete the same analysis, but sort by purchase time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d81cd75",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_remaining_purch = df_remaining.sort_values('purchase_time', ignore_index=True)\n",
    "df_remaining_purch.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5aaf402",
   "metadata": {},
   "source": [
    "No signal present here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ca4e2ba2",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_remaining_purch.groupby(df_remaining_purch['class'].shift())['class'].mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1212bf5",
   "metadata": {},
   "outputs": [],
   "source": [
    "g = df_remaining_purch['class'].rolling(100).sum().shift() > 5\n",
    "df_remaining_purch.groupby(g)['class'].agg(['size', 'mean'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b70abe79",
   "metadata": {},
   "source": [
    "## Simplest model without machine learning\n",
    "\n",
    "I like to formulate models without machine learning that can act as a baseline. There appear to be only two significant signals thus far:\n",
    "\n",
    "* One second until purchase\n",
    "* Repeated device ID\n",
    "\n",
    "Because the one second until purchase yielded 100% fraudulent transactions, we have already separated it out into its own DataFrame. Any future transaction that occurs in one second will be flagged. Though as a cautionary note, all of these transaction happened in January. So future one second events would have to be watched carefully.\n",
    "\n",
    "### Expected value of flagged fraud\n",
    "\n",
    "Repeated device IDs showed to be 22% fraudulent, much higher than the 4.6% baseline. But, we must take into account the cost of an incorrectly flagged transaction (8 dollars). Even knowing that a device ID repeats, we will be wrong 78% of the time. In order to break even, the purchase price multiplied by the probability of fraud must be greater than 8 times the probability of not fraud. We have the following equation:\n",
    "\n",
    "$$ P * p_{f} > (1 - p_{f}) * 8$$\n",
    "\n",
    "We solve for the minimum purchase price."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4ff8618d",
   "metadata": {},
   "outputs": [],
   "source": [
    "INCORRECT_FRAUD_COST = 8\n",
    "def calc_min_price(p):\n",
    "    min_purchase_price = (1 - p) * INCORRECT_FRAUD_COST / p\n",
    "    return min_purchase_price\n",
    "\n",
    "p = df_repeat_device['class'].mean()\n",
    "calc_min_price(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc9e2e02",
   "metadata": {},
   "source": [
    "Therefore, the simplest model we can build would flag all transactions where a repeated device made a purchase of 29.5 or more. The function below summarizes the flagging of fraud and cost/savings to the company."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8bc91bcc",
   "metadata": {},
   "outputs": [],
   "source": [
    "def calc_income(df, min_price):\n",
    "    df = df[['purchase_value', 'class']].copy()\n",
    "    df['flag'] = 0\n",
    "    df['cost'] = 0\n",
    "    df['saved'] = 0\n",
    "    df['revenue'] = 0\n",
    "    is_flag = df['purchase_value'] >= min_price\n",
    "    is_fraud = df['class'] == 1\n",
    "    df.loc[is_flag, 'flag'] = 1\n",
    "    false_pos = is_flag & ~is_fraud\n",
    "    false_neg = ~is_flag & is_fraud\n",
    "    true_pos = is_flag & is_fraud\n",
    "    df.loc[false_pos, 'cost'] = -INCORRECT_FRAUD_COST\n",
    "    df.loc[false_neg, 'cost'] = -df['purchase_value']\n",
    "    df.loc[true_pos, 'saved'] = df['purchase_value']\n",
    "    df.loc[df['class'] == 0, 'revenue'] = df['purchase_value']\n",
    "    df['income'] = df['revenue'] + df['cost']\n",
    "    return df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "633216e2",
   "metadata": {},
   "source": [
    "Running this function with a minimum price of 29 dollars yields the following results. The first two transactions are marked as fraud incorrectly, losing the company 8 dollars each time. The next transactions is marked correctly as fraud saving the company 32 dollars. The 18th transaction is fraud, but not flagged, costing the company 19 dollars."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "56d0ab48",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_income = calc_income(df_repeat_device, 29)\n",
    "df_income.head(20)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b59aef9",
   "metadata": {},
   "source": [
    "We can test our simple model by finding the total cost at all price points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6c367f68",
   "metadata": {},
   "outputs": [],
   "source": [
    "s = pd.Series({i: calc_income(df_repeat_device, i)['cost'].sum() for i in range(9, 100)})\n",
    "s.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aff5c66c",
   "metadata": {},
   "source": [
    "The price that minimizes the cost is at 31, very close to our calculation of 29.5."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a593beac",
   "metadata": {},
   "outputs": [],
   "source": [
    "s.idxmax()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80f09c69",
   "metadata": {},
   "source": [
    "We can make a plot of how the cost varies with different thresholds. A threshold of 100 is essentially no threshold (almost no frauds marked) as very few transactions are made past that price point."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "052c1a32",
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_threshold(df, col, title):\n",
    "    s = pd.Series({i: calc_income(df, i)[col].sum() for i in range(9, 100)})\n",
    "    ax = s.plot(figsize=(5, 2.5))\n",
    "    ax.set_ylabel('Cost')\n",
    "    ax.set_xlabel('Minimum Price Threshold')\n",
    "    ax.set_title(title);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1b4c3cec",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "plot_threshold(df_repeat_device, 'cost', 'Repeated Device Fraud Threshold')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e906889",
   "metadata": {},
   "source": [
    "### Unique device ID\n",
    "\n",
    "Let's put the transactions coming from a unique device ID in their own DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b77760b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_unique_device = df_remaining.query('device_id not in @repeat_device_id')\n",
    "df_unique_device.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "917713c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_unique_device['device_id'].is_unique"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac6596d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_unique_device.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a997109d",
   "metadata": {},
   "source": [
    "We know have three mutually exclusive DataFrames that contain all of the data:\n",
    "\n",
    "* df_one_second\n",
    "* df_repeat_device\n",
    "* df_unique_device\n",
    "\n",
    "Let's verify that the number of rows equals 96,000."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c2d10dea",
   "metadata": {},
   "outputs": [],
   "source": [
    "len(df_one_second) + len(df_repeat_device) + len(df_unique_device)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "393d46cd",
   "metadata": {},
   "source": [
    "Let's get the baseline fraud for this subset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d1ef128",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_unique_device['class'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2a44bb1",
   "metadata": {},
   "source": [
    "It's down to just 3.6%. It might be difficult to find a group within here that makes it worthwhile to flag as fraud. Even when grouping by 4 variables below, the highest fraud percent is 10%."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2fa6487e",
   "metadata": {},
   "outputs": [],
   "source": [
    "fraud_group(df_unique_device, ['country', 'source', 'browser', 'sex'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0758c30f",
   "metadata": {},
   "source": [
    "Let's calculate the minimum price needed to flag a transaction as fraudulent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd827cdc",
   "metadata": {},
   "outputs": [],
   "source": [
    "calc_min_price(df_unique_device['class'].mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f341815c",
   "metadata": {},
   "source": [
    "This is greater than the maximum, so no transactions would be flagged. Let's plot the cost against the price threshold for this dataset. As expected, there is no threshold that is viable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "92a91084",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "plot_threshold(df_unique_device, 'cost', 'Unique Device Fraud Threshold')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91b2b620",
   "metadata": {},
   "source": [
    "## Formal machine learning\n",
    "\n",
    "In this section, we'll take a more formal approach to model building by using sophisticated pre-built machine learning algorithms from the scikit-learn python library. As we did above, we will slowly build models gradually increasing complexity. We will work with the entire training dataset to show how transformations can be automated and passed through a machine learning pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9ee85f2",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_train.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "39476992",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import OneHotEncoder, FunctionTransformer, KBinsDiscretizer\n",
    "from sklearn.compose import ColumnTransformer\n",
    "from sklearn.pipeline import Pipeline\n",
    "from sklearn.linear_model import LogisticRegression"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a64a426b",
   "metadata": {},
   "source": [
    "Two functions are defined to create two features, both of which are booleans. The first determines whether the transaction took place in one second and the second determines if the device ID is duplicated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0f6d2c72",
   "metadata": {},
   "outputs": [],
   "source": [
    "def is_one_second(df, return_frame=True):\n",
    "    s = (df['purchase_time'] - df['signup_time']).dt.total_seconds() == 1\n",
    "    if return_frame:\n",
    "        return s.to_frame()\n",
    "    return s\n",
    "\n",
    "def is_dupe_device(s, return_frame=True):\n",
    "    s = s.duplicated(keep=False)\n",
    "    if return_frame:\n",
    "        return s.to_frame()\n",
    "    return s\n",
    "\n",
    "ft_one_second = FunctionTransformer(is_one_second)\n",
    "ft_dupe_device = FunctionTransformer(is_dupe_device)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fdd82917",
   "metadata": {},
   "source": [
    "A machine learning pipeline is built to transform the columns before sending them to the logistic regression model which is used for classification. The source, browser, and sex nominal categorical variables are one-hot encoded. The above functions are used to produce the other two features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e0739e2c",
   "metadata": {},
   "outputs": [],
   "source": [
    "ct = ColumnTransformer([\n",
    "    ('cat', OneHotEncoder(), ['source', 'browser', 'sex']),\n",
    "    ('one_second', ft_one_second, ['signup_time', 'purchase_time']),\n",
    "    ('dev', ft_dupe_device, 'device_id'),\n",
    "])\n",
    "lr = LogisticRegression(max_iter=1000)\n",
    "pipe = Pipeline([\n",
    "    ('ct', ct), \n",
    "    ('lr', lr)\n",
    "])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b3feca2",
   "metadata": {},
   "source": [
    "The pipeline is fit (variables transformed and model trained) and the probabilities of fraud are returned."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2abab1e",
   "metadata": {},
   "outputs": [],
   "source": [
    "y_true = df_train['class']\n",
    "pipe.fit(df_train, y=y_true);\n",
    "probs = pipe.predict_proba(df_train)\n",
    "probs[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb6ab4f2",
   "metadata": {},
   "source": [
    "The probability of fraud is contained in the second column and assigned to the variable name `y_pred`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "04b025f8",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "y_pred = probs[:, 1]\n",
    "y_pred"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5cfd961",
   "metadata": {},
   "source": [
    "The maximum probability of fraud was over 99%."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d987e199",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "y_pred.max()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d80a167b",
   "metadata": {},
   "source": [
    "Let's see if the model was able to find the one second transactions that were all fraudulent by filtering for all transactions with higher than 99% probability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b24578c",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_train[y_pred > .99].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ae4c211",
   "metadata": {},
   "source": [
    "These are the same 6,021 rows found earlier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11d69fc9",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_one_second.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53d9f60f",
   "metadata": {},
   "source": [
    "Let's look at the probabilities of fraud for the duplicated device IDs that are not one second transactions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "93ff0a76",
   "metadata": {},
   "outputs": [],
   "source": [
    "filt1 = is_dupe_device(df_train['device_id'], return_frame=False)\n",
    "filt2 = is_one_second(df_train, return_frame=False)\n",
    "filt = filt1 & ~filt2\n",
    "prob_fraud_dupe_device = y_pred[filt]\n",
    "prob_fraud_dupe_device[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72f758fa",
   "metadata": {},
   "source": [
    "The range of probabilities is between 20% and 26%, around our calculated 21% fraud without machine learning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c090d816",
   "metadata": {},
   "outputs": [],
   "source": [
    "prob_fraud_dupe_device.min(), prob_fraud_dupe_device.max()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b468d3fa",
   "metadata": {},
   "source": [
    "### Confusion matrix\n",
    "\n",
    "Let's get a look at all of the combinations of events - true positives/negatives and false positives/negatives by creating a confusion matrix. First we use our `calc_income` function to get the final decision."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "50ba61ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "min_price = calc_min_price(y_pred)\n",
    "df_income = calc_income(df_train, min_price)\n",
    "df_income.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5116a43e",
   "metadata": {},
   "source": [
    "Use scikit-learn to create confusion matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "52f90f11",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import confusion_matrix\n",
    "def create_confusion(y_true, df_income, filt=None):\n",
    "    y_pred = df_income['flag']\n",
    "    if filt is not None:\n",
    "        y_true = y_true[filt]\n",
    "        y_pred = y_pred[filt]\n",
    "    df_conf = pd.DataFrame(confusion_matrix(y_true, y_pred))\n",
    "    df_conf.index.name = 'actual'\n",
    "    df_conf.columns.name = 'predicted'\n",
    "    return df_conf\n",
    "\n",
    "create_confusion(y_true, df_income)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "accb5b2a",
   "metadata": {},
   "source": [
    "* 3,446 transactions were fraudulent that our model did not detect (false negative). Cost equal to purchase_value\n",
    "* 2,385 transactions were predicted to be fraudulent from our model that were not (false positive). Cost of 8.\n",
    "* 6,709 were correctly predicted fraud transactions saving purchase_value\n",
    "* The rest were correctly predicted as not fraudulent.\n",
    "\n",
    "### Filtering out one-second transactions\n",
    "\n",
    "Filtering out the one second transactions (with over 99% predicted fraud), we get the following confusion matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72fb08f2",
   "metadata": {},
   "outputs": [],
   "source": [
    "filt = y_pred < .99\n",
    "create_confusion(y_true, df_income, filt)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12cad049",
   "metadata": {},
   "source": [
    "These results look far less impressive, but there was only signal in the repeated device ID transactions."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2ee820a",
   "metadata": {},
   "source": [
    "### Calculating mean cost\n",
    "\n",
    "We can adapt our previous function `calc_income` from above to return the average cost of each transaction.\n",
    "\n",
    "* The minimum price to flag is calculated with the `calc_min_price` function defined above\n",
    "* The transaction is flagged if the purchase_value is greater than this minimum price\n",
    "* The cost for a false positive (-8) and false negative (-purchase_value) are found \n",
    "* The average cost for all transactions is returned."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4f914dd8",
   "metadata": {},
   "outputs": [],
   "source": [
    "def mean_cost(y_true, y_pred, purchase_value):\n",
    "    min_price = calc_min_price(y_pred)\n",
    "    is_flag = purchase_value > min_price\n",
    "    is_fraud = y_true == 1\n",
    "    false_pos = is_flag & ~is_fraud\n",
    "    false_neg = ~is_flag & is_fraud\n",
    "    \n",
    "    false_pos_cost = false_pos * -INCORRECT_FRAUD_COST\n",
    "    false_neg_cost = false_neg * -purchase_value\n",
    "    cost = false_pos_cost + false_neg_cost \n",
    "    return cost.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb820436",
   "metadata": {},
   "source": [
    "An average cost of -1.46 is the result of this model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d63e7918",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "mean_cost(y_true, y_pred, df_train['purchase_value'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5b74472",
   "metadata": {},
   "source": [
    "### Country encoding\n",
    "\n",
    "There are well over 100 different countries in our dataset, but many of them appear just a handful of times. Let's look at the countries that appear less than 100 times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "896a0a2a",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_train['country'].value_counts()[lambda x: x < 100]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1723ce61",
   "metadata": {},
   "source": [
    "While there may be signal in some of these countries, we'll create a function to convert them to missing values, leaving only the countries that appeared more than 100 times in the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "26a61a23",
   "metadata": {},
   "outputs": [],
   "source": [
    "def encode_country(s, min_count=100, return_frame=True):\n",
    "    low_ct_countries = s.value_counts()[lambda x: x < min_count].index\n",
    "    s = s.mask(s.isin(low_ct_countries)).cat.remove_unused_categories()\n",
    "    if return_frame:\n",
    "        return s.to_frame()\n",
    "    return s\n",
    "\n",
    "ft_encode_country = FunctionTransformer(encode_country)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12e4a878",
   "metadata": {},
   "source": [
    "A separate pipeline is used to process this column and one-hot encode it. The new missing values (countries that appear less than 100 times) will all be treated as the same category."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16155cdb",
   "metadata": {},
   "outputs": [],
   "source": [
    "pipe_countries = Pipeline([\n",
    "    ('country_agg', ft_encode_country),\n",
    "    ('country_ohe', OneHotEncoder(handle_unknown='ignore'))\n",
    "])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3af2f79",
   "metadata": {},
   "source": [
    "### Binning age\n",
    "\n",
    "The `KBinsDiscretizer` transformer is able to automatically bin the age variable. \n",
    "\n",
    "### Leaving the purchase_value as continuous\n",
    "\n",
    "We 'passthrough' the purchase_value column without transforming it to leave it as a continuous variable.\n",
    "\n",
    "### Adding to the pipeline\n",
    "\n",
    "The country, age, and purchase_value are added to the column transformer, before recreating the final pipeline again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87e74a34",
   "metadata": {},
   "outputs": [],
   "source": [
    "ct = ColumnTransformer([\n",
    "    ('cat', OneHotEncoder(), ['source', 'browser', 'sex']),\n",
    "    ('one_second', ft_one_second, ['signup_time', 'purchase_time']),\n",
    "    ('dev', ft_dupe_device, 'device_id'),\n",
    "    ('country', pipe_countries, 'country'),\n",
    "    ('age_bin', KBinsDiscretizer(n_bins=5, strategy='quantile'), ['age']),\n",
    "    ('cont', 'passthrough', ['purchase_value'])\n",
    "])\n",
    "lr = LogisticRegression(max_iter=1000)\n",
    "pipe = Pipeline([\n",
    "    ('ct', ct), \n",
    "    ('lr', lr)\n",
    "])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60ac7083",
   "metadata": {},
   "source": [
    "We re-train the new model and calculate the average cost once again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "396c3bc6",
   "metadata": {},
   "outputs": [],
   "source": [
    "pipe.fit(df_train, y_true);\n",
    "y_pred = pipe.predict_proba(df_train)[:, 1]\n",
    "mean_cost(y_true, y_pred, df_train['purchase_value'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6bb13220",
   "metadata": {},
   "source": [
    "This cost is nearly identical to the simpler model, suggesting that very few decisions were changed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2cc625ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "min_price = calc_min_price(y_pred)\n",
    "df_income = calc_income(df_train, min_price)\n",
    "df_income.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d536a181",
   "metadata": {},
   "source": [
    "We verify that the confusion matrix is very similar."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e9129a08",
   "metadata": {},
   "outputs": [],
   "source": [
    "create_confusion(y_true, df_income)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cfa7632a",
   "metadata": {},
   "source": [
    "## Evaluating the model on the test dataset\n",
    "\n",
    "We can use the held-out test dataset to evaluate our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0b6cf2c",
   "metadata": {},
   "outputs": [],
   "source": [
    "y_test_true = df_test['class']\n",
    "y_test_pred = pipe.predict_proba(df_test)[:, 1]\n",
    "mean_cost(y_test_true, y_test_pred, df_test['purchase_value'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d684ea2d",
   "metadata": {},
   "source": [
    "This cost of -1.66 is lower than the training set, likely because there were no transactions that took place in one second. These were all fraudulent and therefore easier to detect."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a200a78",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "A simple model flagging all transactions that occurred in one second and those that had a repeated device ID with price above 29 dollars appears to capture most of the potential value. More investigation needs to be done to determine if other models would be able to find more signal in the data to flag more transactions as fraud."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5542b968",
   "metadata": {},
   "source": [
    "## Future work\n",
    "\n",
    "* Cross validation to tune hyperparameters of model (penalty for logistic regression, number of bins for age, minimum count for countries, etc...)\n",
    "* Evaluate significance of each variable. Do any variables besides time to purchase and device ID provide signal?\n",
    "* Different machine learning models such as random forests\n",
    "* Automate entire workflow and serialize model on disk"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
